Term Mention Recognition Task
(Redirected from term identification)
Jump to navigation
Jump to search
A Term Mention Recognition Task is a Word Mention Recognition Task (term mention detection and term mention classification) that is restricted to term mentions.
- AKA: ATR, Terminological Unit Recognition, Term Identification.
- Context:
- Input: Text Documents.
- output: Annotated Documents.
- It can range from being a Heuristic Term Recognition Task to being a Data-Driven Term Recognition Task.
- It can range from being a Manual Term Recognition Task to being a Automatic Term Recognition Task.
- It can be supported by a Term Mention Detection Task, a Term Mention Classification Task, and a Term Mention Reference Resolution Task.
- It can be solved by a Term Recognition System/Term Mention Normalization System (that implements a Term Recognition Algorithm/Term Mention Normalization Algorithm.
- It can support a Terminology Extraction Task.
- Example(s):
- "The soft contact lens was invented in 1959." ⇒ "The [soft contact lens] was invented in 1959.".
- "A real time expert system is the next goal." ⇒ "A [real time] [expert system] is the next goal."
- a Gazetteer-based Term Annotation.
- …
- Counter-Example(s):
- See: Entity Mention Recognition Task, NP Chunking Task.
References
2009
- (Wermter et al., 2009) ⇒ Joachim Wermter, Katrin Tomanek, and Udo Hahn. (2009). “High-performance gene name normalization with GENO.” In: Bioinformatics, 25(6)
- NOTE: obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BIOCREATIVE-II test set
- NOTE: employs a carefully crafted suite of symbolic and statistical methods
- NOTE: It relies on publicly available software and data resources, including extensive background knowledge based on semantic profiling.
- (Wong et al., 2009) ⇒ Wilson Wong, Wei Liu, and Mohammed Bennamoun. (2009). “A Probabilistic Framework for Automatic Term Recognition.” In: Intelligent Data Analysis, 13(4). doi:10.3233/IDA-2009-0379
2005
- (Spasic et al., 2005) ⇒ Irena Spasic, Sophia Ananiadou, John McNaught, and Anand Kumar. (2005). “Text Mining and Ontologies in Biomedicine: Making sense of raw text.” In: Briefings in Bioinformatics, 6(3).
- QUOTE: This paper summarises different approaches in which ontologies have been used for text-mining applications in biomedicine.
- (Chen et al., 2005) ⇒ Lifeng Chen, Hongfang Liu, and Carol Friedman. (2005). “Gene Name Ambiguity of Eukaryotic Nomenclatures.” In: Bioinformatics, 21(2).
- QUOTE: ... One essential task is to recognize and identify genomic entities in text. ‘Recognition’ can be accomplished using pattern matching and machine learning. But for ‘identification’ these techniques are not adequate. In order to identify genomic entities, NLP needs a comprehensive resource that specifies and classifies genomic entities as they occur in text and that associates them with normalized terms and also unique identifiers so that the extracted entities are well defined. Online organism databases are an excellent resource to create such a lexical resource. However, gene name ambiguity is a serious problem because it affects the appropriate identification of gene entities. In this paper, we explore the extent of the problem and suggest ways to address it.
2004
- (Krauthammer & Nenadic, 2004) ⇒ Michael Krauthammer, and Goran Nenadic. (2004). “Term Identification in the Biomedical Literature.” In: Journal of Biomedical Informatics, 37(6). doi:10.1016/j.jbi.2004.08.004
- QUOTE: Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities of a dynamically changing biomedical terminology, term identification has been recognized as the current bottleneck in text mining, and — as a consequence — has become an important research topic both in natural language processing and biomedical communities. This article overviews state-of-the-art approaches in term identification.
We differentiate three main steps for the successful identification of terms from literature: term recognition, term classification, and term mapping (see Figure 1).
Term recognition is a non-trivial task of marking single or several adjacent words that indicate the presence of domain concepts. Its main goal is to differentiate between terms and non-terms.
- QUOTE: Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities of a dynamically changing biomedical terminology, term identification has been recognized as the current bottleneck in text mining, and — as a consequence — has become an important research topic both in natural language processing and biomedical communities. This article overviews state-of-the-art approaches in term identification.
- (Tuason et al., 2004) ⇒ Olivia Tuason, Lifeng Chen, Hongfang Liu, Judith A. Blake, and Carol Friedman. (2004). “Biological Nomenclature: A Source of Lexical Knowledge and Ambiguity.” In: Proceedings of Pacïfic Symposium Biocomputing. PAC. SYMP. BIOCOMPUT. 2004: 238-49.
- (Spasić & Ananiadou, 2004) ⇒ Irena Spasić, Sophia Ananiadou. (2004). “Using Automatically Learnt Verb Selectional Preferences for Classification of Biomedical Terms.” In: Journal of Biomedical Informatics, 37(6). doi:10.1016/j.jbi.2004.08.002.
- QUOTE: ... the corpus is terminologically processed: term recognition is performed by both looking up the dictionary of terms listed in the ontology and applying the C/NC-value method for on-the-fly term extraction. ...
- (Yeganova et al., 2004) ⇒ L. Yeganova, L. Smith, W. J. Wilbur. (2004). “Identification of Related Gene/Protein Names Based on an HMM of Name Variations.” In: Computational Biology and Chemistry, 28(2).
- ABSTRACT: Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.
2003
- (Mitkov, 2003) ⇒ Ruslan Mitkov, editor. (2003). “The Oxford Handbook of Computational Linguistics." Oxford University Press. ISBN:019927634X
- QUOTE: term recognition: Automatic recognition of term and variant occurrences in corpora.
2002
- (Hirschman et al., 2002) ⇒ Lynette Hirschman, Alexander A. Morgan, and Alexander S. Yeh (2002). “Rutabaga by any Other Name: extracting biological names.” In: Journal of Biomedical Informatics, 35(4) doi:10.1016/S1532-0464(03)00014-5
- QUOTE: As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
- (Blaschke & Valencia, 2002) ⇒ Christian Blaschke, and A. Valencia. (2002). “Molecular Biology Nomenclature Thwarts Information-Extraction Progress.” In: IEEE Intelligent Systems, 17(3).
2000
- (Frantzi et al., 2000) ⇒ Katerina Frantzi, Sophia Ananiadou, and Hideki Mima. (2000). “Automatic Recognition of Multi-Word Terms: The Cvalue/NC-value method.” In: International Journal on Digital Libraries, 3(2).
1998
- (Fukuda et al., 1998) ⇒ K Fukuda, A. Tamura, T. Tsunoda, and T. Takagi. (1998). “Toward Information Extraction: identifying protein names from biological papers.” In: Proceedings of Pac Symp Biocomput. (1998). p. 707-718.
1996
- (Lauriston, 1996) ⇒ Andy Lauriston. (1996). “Automatic Term Recognition: performance of Linguistic and Statistical Techniques. PhD thesis, University of Manchester Institute of Science and Technology.
1995
- (Dagan & Church, 1995) ⇒ Ido Dagan, and Kenneth W. Church. (1995). “Termight: Identifying and translating technical terminology.” In: Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, (EACL 1995). doi:10.3115/974358.974367.
- QUOTE: We propose a semi-automatic tool, termight, that helps professional translators and terminologists identify technical terms and their translations. The tool makes use of part-of-speech tagging and word-alignment programs to extract candidate terms and their translations. Although the extraction programs are far from perfect, it isn't too hard for the user to filter out the wheat from the chaff. The extraction algorithms emphasize completeness. Alternative proposals are likely to miss important but infrequent terms/translations. To reduce the burden on the user during the filtering phase, candidates are presented in a convenient order, along with some useful concordance evidence, in an interface that is designed to minimize keystrokes. Termight is currently being used by the translators at AT&T Business Translation Services (formerly AT&T Language Line Services).
1994
- (Ananiadou, 1994) ⇒ Sophia Ananiadou. (1994). “A Methodology for Automatic Term Recognition.” In: Proceedings of the 15th International Conference on Computational Linguistics, COLING'94, pages 1034{1038, 1994.
- (Daille et al., 1994) ⇒ Beatrice Daille, Eric Gaussier, and Jean-Marc Lange. (1994). “Towards Automatic Extraction of Monolingual and Bilingual Terminology.” In: Proceedings of the 15th International Conference on Computational Linguistics, COLING'94, pages 515{521, 1994.
1992
- (Bourigault, 1992) ⇒ Didier Bourigault. (1992). “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” In: Proceedings of the Fifteenth International Conference on Computational Linguistics.
1990
- (Sager, 1990) ⇒ Juan C. Sager. (1990). “A Practical Course in Terminology Processing." John Benjamins Publishing Company.
1988
- (Ananiadou, 1988) ⇒ Sophia Ananiadou. (1988). “Towards a Methodology for Automatic Term Recognition. PhD thesis, University of Manchester Institute of Science and Technology.
1980
- (Sager et al., 1980) ⇒ Juan C. Sager, David Dungworth, and Peter F. McDonald. (1980). “English Special Languages: principles and practice in science and technology. Oscar Brandstetter Verlag.