GENIA Corpus
Jump to navigation
Jump to search
The GENIA Corpus is an Annotated Abstracts Dataset of Biomedicine Abstracts that have been Curated for Entities in the GENIA Ontology.
- AKA: GENIA Dataset.
- Context:
- Composed of 2000 Medline abstracts
- Approximately 500,000 words
- http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA
- See: GENIA Project, BioCreAtIvE Corpus.
References
2003
- (Kim et al., 2003) ⇒ Jin-Dong Kim, Tomoko Ohta, Yuka Teteisi, and Jun'ichi Tsujii. (2003). “GENIA Corpus - a semantically annotated corpus for bio-textmining.” In: Bioinformatics. 19(suppl. 1).
2002
- (Ohta et al., 2002) ⇒ Tomoko Ohta, Yuka Tateisi, and Jin-Dong Kim. (2002). “The GENIA corpus: an annotated research abstract corpus in molecular biology domain.” In: Proceedings of the 2nd International Conference on Human Language Technology Research (HLT 2002).