Domain-Specific Corpus
(Redirected from domain-specific document corpus)
Jump to navigation
Jump to search
A Domain-Specific Corpus is a corpus from a specific domain.
- Context:
- It can be a specialized collection of texts that is tailored to a particular subject or field for linguistic analysis.
- …
- Example(s):
- Counter-Example(s):
- an Open Corpus, such as The Web,
- a Domain-Specific Knowledge Base.
- See: Domain-Specific Task, Knowledge Base, Ontology, Benchmark Dataset, Natural Language Processing, Corpus Linguistics, Specialized Text Analysis, Text Corpora.
References
2010
- (Melli, 2010a) ⇒ Gabor Melli. (2010). “Concept Mentions within KDD-2009 Abstracts (kdd09cma1) Linked to a KDD Ontology (kddo1)." In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010).
- QUOTE: The kdd09cma1 corpus is based on the 139 abstracts of the papers accepted for ACM's SIGKDD annual conference in 2009 (KDD 2009) that are freely accessible from ACM's Digital Library [1]. KDD is a competitive peer-reviewed conference with acceptance rates in the range of 20% -25%. The conference topic is data mining and knowledge discovery from databases.
The abstracts were manually annotated by the author for concept mentions. We define a concept mention to be a sequence of tokens (orthographic words and punctuation) whose meaning is deemed by an expert to be used within their community of speakers, and whose meaning is not necessarily well understood by a member of the general public. Often concept mentions are words (terminological units), but not always. The mentions can also be phrases. For example the phrase “problem of web classification” could be identified as a mention of the
Web_Object Classification_Task
concept.
- QUOTE: The kdd09cma1 corpus is based on the 139 abstracts of the papers accepted for ACM's SIGKDD annual conference in 2009 (KDD 2009) that are freely accessible from ACM's Digital Library [1]. KDD is a competitive peer-reviewed conference with acceptance rates in the range of 20% -25%. The conference topic is data mining and knowledge discovery from databases.