AQUAINT Corpus

AKA: AQUAINT.
Context:
- It contains 1,033,461 Newswire Articles from New York Times Newswire, Associated Press Newswire, and Xinhua News Agency Newswire.
- It was created by the AQUAINT Project.
- It has LCD Catalog Number LDC2002T31 in the LDC Corpora.
See: AQUAINT-2 Corpus.

References

http://www.nist.gov/tac/data/data_desc.html#AQUAINT
- The AQUAINT corpus of English News Text consists of 1,033,461 documents taken from the New York Times, the Associated Press, and the Xinhua News Agency newswires. The collection spans the years 1999-2000 (1996-2000 for Xinhua documents). The AQUAINT collection is distributed by the Linguistic Data Consortium (LDC catalog number LDC2002T31).
- http://www.nist.gov/tac/data/aquaint.dtd.txt

http://www.ldc.upenn.edu/Catalog/docs/LDC2002T31/
- This file contains documentation on the AQUAINT Corpus, Linguistic Data Consortium (LDC) catalog number LDC2002T31 and isbn 1-58563-240-6.
- This corpus consists of newswire text data in English, drawn from three sources: the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service. It was prepared by the LDC for the AQUAINT Project, and will be used in official benchmark evaluations conducted by National Institute of Standards and Technology (NIST).