Large Text Document Corpus

A Large Text Document Corpus is a corpus that is a large dataset (that requires significant resources to processed by a machine but can fit in large memory banks).

Context:
- It can fit into the computer memory of a very large computer.
- It can range from being a Relatively Large Corpus to being a Very Large Corpus.
Example(s):
- a Large Text Corpus, such as Genia Corpus, TREC Corpus, the KDD-2009 Abstracts Corpus.
- …
Counter-Example(s):
- a Small Corpus, such as the kdd09cma1 Corpus.
- any Large Corpora, such as a Web Snapshot.
See: Information Extraction Task, PubMed Corpus.