2004 ImprovingThePerfOfDictProtNER

Subject Headings: Protein NER, Dictionary-based Algorithm.

Notes

(Cohen, 2005) ⇒ Aaron M. Cohen. (2005). “Unsupervised gene/protein named entity normalization using automatically extracted dictionaries.” In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases.
- Tsuruoka and Tsujii recently studied the use of dictionary-based approaches for protein name recognition (Tsuruoka and Tsujii, 2004), although they did not evaluate the normalization performance. They applied a probabilistic term variant generator to expand the dictionary, and a Bayesian contextual filter with a sub-sentence window size to classify the terms in the GENIA corpus as likely to represent protein names. Overall they obtained a precision of 71.1%, at a recall of 62.3% and an F-measure of 66.6%. Tsuruoka and Tsujii did not make use of curated database information, and instead split the GENIA corpus into training and test data sets of 1800 and 200 abstracts respectively, and extracted the tagged protein names from the training set to use as a dictionary. These results compare well to, being a bit below, other non-dictionary based methods applied to the GENIA corpus (Lee et al., 2004, Zhou et al., 2004).

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 ImprovingThePerfOfDictProtNER	Jun'ichi Tsujii Yoshimasa Tsuruoka			Improving the Performance of Dictionary-based Approaches in Protein Name Recognition			http://dx.doi.org/10.1016/j.jbi.2004.08.003	10.1016/j.jbi.2004.08.003