2003 GENIAcorpus

Subject Headings: GENIA Corpus, Computational Molecular Biology.

Notes

Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining.

GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400 000 words and almost 100 000 annotations for biological terms.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 GENIAcorpus	Jun'ichi Tsujii Tomoko Ohta Jin-Dong Kim Yuka Teteisi			GENIA Corpus - a semantically annotated corpus for bio-textmining		Bioinformatics Subject Area	http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl 1/i180			2003