2004 ImprovingThePerfOfDictProtNER

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Protein NER, Dictionary-based Algorithm.

Notes

Cited By

  • (Cohen, 2005) ⇒ Aaron M. Cohen. (2005). “Unsupervised gene/protein named entity normalization using automatically extracted dictionaries.” In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases.
    • Tsuruoka and Tsujii recently studied the use of dictionary-based approaches for protein name recognition (Tsuruoka and Tsujii, 2004), although they did not evaluate the normalization performance. They applied a probabilistic term variant generator to expand the dictionary, and a Bayesian contextual filter with a sub-sentence window size to classify the terms in the GENIA corpus as likely to represent protein names. Overall they obtained a precision of 71.1%, at a recall of 62.3% and an F-measure of 66.6%. Tsuruoka and Tsujii did not make use of curated database information, and instead split the GENIA corpus into training and test data sets of 1800 and 200 abstracts respectively, and extracted the tagged protein names from the training set to use as a dictionary. These results compare well to, being a bit below, other non-dictionary based methods applied to the GENIA corpus (Lee et al., 2004, Zhou et al., 2004).

Quotes

Abstract



References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 ImprovingThePerfOfDictProtNERJun'ichi Tsujii
Yoshimasa Tsuruoka
Improving the Performance of Dictionary-based Approaches in Protein Name Recognitionhttp://dx.doi.org/10.1016/j.jbi.2004.08.00310.1016/j.jbi.2004.08.003