DEFT Corpus

From GM-RKB
Jump to navigation Jump to search

A DEFT Corpus is an NLP benchmark corpus for definition extraction.



References

2019

  • https://github.com/adobe-research/deft_corpus
    • QUOTE: Welcome to the largest expertly annotated corpus for complex definition extraction in free text. Pardon our dust - this data is associated withSemEval 2020 Task 6 (DeftEval) and we are releasing the full dataset on the SemEval conference schedule. Train and dev data are available, and test data will become available after the completion of the SemEval evaluation period on 2 Feb 2020. You can source the complete text from the corresponding textbooks at https://cnx.org.

      The most recent version of the corpus was updated on 30 OCT 2019.

2019b

  • https://competitions.codalab.org/competitions/20900
    • QUOTE: Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well defined, structured, and narrow conditions. In reality, natural language is complicated, and complicated data requires both complex solutions and data that reflects that reality. The DEFT corpus expands on these cases to include term-definition pairs that cross sentence boundaries, lack explicit definitors, or definition-like verb phrases (e.g. is, means, is defined as, etc.), or appear in non-hypernym structures.

2019