Gazetteer-based Term Annotation Task
(Redirected from Gazetteer-based Term Recognition)
Jump to navigation
Jump to search
A Gazetteer-based Term Annotation Task is a term recognition task that is based on a term gazetteer.
- Context:
- It can be solved by a Gazetteer-based Term Recognition System (that implements a Gazetteer-based Term Recognition Algorithm).
- See: Data-Driven NER Task.
References
2008
- (Smith et al., 2008) ⇒ Larry Smith, Lorraine K. Tanabe, Rie J. Ando, Cheng-Ju Kuo, I-Fang Chung, Chun-Nan Hsu, Yu-Shi Lin, Roman Klinger, Christoph M. Friedrich, and Kuzman Ganchev, Manabu Torii, Hongfang Liu, Barry Haddow, Craig A. Struble, Richard J. Povinelli, Andreas Vlachos, William A. Baumgartner, Lawrence Hunter, Bob Carpenter, Richard TH Tsai, Hong-Jie Dai, Feng Liu, Yifei Chen, Chengjie Sun, Sophia Katrenko, Pieter Adriaans, Christian Blaschke, Rafael Torres, Mariana Neves, Preslav Nakov, Anna Divoli, Manuel Maña-López, Jacinto Mata, and W. John Wilbur. (2008). “Overview of BioCreative II Gene Mention Recognition.” In: Genome biology, 9(Suppl 2). doi:10.1186/gb-2008-9-s2-s2
- QUOTE: NER seeks to identify the words and phrases in text that reference entities in a given category, such as people, places, or companies, or in this application genes and proteins. NER is frequently accomplished with B-I-O tagging, which classifies each token as being at the beginning of the named entity (B), continuing the entity (I), or outside of any entity to be tagged (O). There are several lexical resources (sources of information about words) commonly used in solving the NER problem. A gazetteer is a list of names belonging to a particular category, such as places, persons, companies, genes, and so on. A lexicon is a source of information about different forms or grammatical properties of words. A thesaurus is a source of information indicating words with similar and/or related meanings. Systems in the BioCreative I challenge were classified as open if they used lexical resources, particularly gazetteers, and otherwise closed. A commonly used lexical resource is the Unified Medical Language System (UMLS), a controlled vocabulary of biomedical terminology maintained by the US National Library of Medicine.
2002
- (Cunningham et al., 2002) ⇒ Hamish Cunningham, Diana Maynard, Kalina Bontcheva, and Valentin Tablan. (2001). “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications.” In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002).
- QUOTE: Provided with GATE is a set of reusable processing resources for common NLP tasks. … ANNIE consists of the following main processing resources: tokeniser, sentence splitter, POS tagger, gazetteer, finite state transducer …
1999
- (Mikheev et al., 1999) ⇒ Andrei Mikheev, Marc Moens, and Claire Grover. (1999). “Named Entity Recognition Without Gazetteers.” In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. doi:10.3115/977035.977037