Large Text Document Corpus
(Redirected from large collections of unstructured text)
Jump to navigation
Jump to search
A Large Text Document Corpus is a corpus that is a large dataset (that requires significant resources to processed by a machine but can fit in large memory banks).
- Context:
- It can fit into the computer memory of a very large computer.
- It can range from being a Relatively Large Corpus to being a Very Large Corpus.
- Example(s):
- a Large Text Corpus, such as Genia Corpus, TREC Corpus, the KDD-2009 Abstracts Corpus.
- …
- Counter-Example(s):
- a Small Corpus, such as the kdd09cma1 Corpus.
- any Large Corpora, such as a Web Snapshot.
- See: Information Extraction Task, PubMed Corpus.