Named Entity Recognition (NER) Algorithm
Jump to navigation
Jump to search
A Named Entity Recognition (NER) Algorithm is an entity mention recognition algorithm that can solve a named entity recognition task.
- Context:
- It can be applied by a Named Entity Recognition System.
- It can range from:
- being a Rule-based Named Entity Recognition Algorithm (such as a Heuristic NER Algorithm or a Dictionary-based Named Entity Recognition Algorithm).
- to being a Data-driven NER Algorithm (such as an Unsupervised Named Entity Recognition Algorithm or a Supervised Named Entity Recognition Algorithm).
- It can be supported by:
- It can range from being a Language-Independent Named Entity Recognition Algorithm to being a Language-Dependent Named Entity Recognition Algorithm, that takes advantage of a Language's Constraints.
- ...
- Example(s):
- an Entity-Specific NER Algorithm, such as a: protein NER algorithm.
- ...
- a CRF-based NER Algorithm, such as … MALLET SimpleTagger.
- a Neural NER Algorithm, such as:
- a Bidirectional Transformer Encoder-based NERs, such as BERT-MRC (Li, Feng et al., 2019) an GLiNER (Zaratiana et al., 2023).
- a Cross-domain and Cross-lingual NERs, such as T-NER.
- an LLM-based NERs, such as GPT-NER (Wang, Sun et al., 2023).
- …
- Counter-Example(s):
- See<: Named Entity Classifier.
References
2011
- (Liu et al., 2011) ⇒ Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. (2011). “Recognizing Named Entities in Tweets.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. … We propose a novel NER system to address these challenges. Firstly, a K-Nearest Neighbors (KNN) based classifier is adopted to conduct word level classification, leveraging the similar and recently labeled tweets. Following the two-stage prediction aggregation methods (Krishnan and Manning, 2006), such pre-labeled results, together with other conventional features used by the state-of-the-art NER systems, are fed into a linear Conditional Random Fields (CRF) (Lafferty et al., 2001) model, which conducts fine-grained tweet level NER. Furthermore, the KNN and CRF model are repeatedly retrained with an incrementally augmented training set, into which high confidently labeled tweets are added. Indeed, it is the combination of KNN and CRF under a semi-supervised learning framework that differentiates ours from the existing. Finally, following [[Lev Ratinov and Dan Roth (2009)]], 30 gazetteers are used, which cover common names, countries, locations, temporal expressions, etc. These gazetteers represent general knowledge across domains. The underlying idea of our method is to combine global evidence from KNN and the gazetteers with local contextual information, and to use common knowledge and unlabeled tweets to make up for the lack of training data.
2007
- (Kazam & Torsawa, 2007) ⇒ J. Kazama and K. Torisawa. (2007). “Exploiting Wikipedia as External Knowledge for Named Entity Recognition.” In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 698–707, 2007.
- (NadSek, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification." Lingvisticae Investigationes. Volume 30, Edition 1.
2006
- (Kozareva, 2006) ⇒ Zornitsa Kozareva. (2006). “Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists.” In: Proceedings of EACL 2006.
2005
- (Cimiano & Völker, 2005) ⇒ Philipp Cimiano, and Johanna Völker. (2005). “Towards Large-scale, Open-domain and Ontology-based Named Entity Classification.” In: Proceedings of RANLP-2005.
2004
- (McDonald et al., 2004) ⇒ Ryan T. McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White and Fernando Pereira. (2004). “An entity tagger for recognizing acquired genomic variations in cancer literature." Bioinformatics 2004 20(17):3249-3251; doi:10.1093/bioinformatics/bth350
2002
- (FEOAL, 2002) ⇒ K Franzén, G Eriksson, F Olsson, L Asker, P Lidén, J. Coster. (2002). “Protein names and how to find them." Elsevier. International Journal of Medical Informatics, Volume 67, Issue 1 - 3, Pages 49 - 61
- Investigates NER of proteins
1997
- (Bikel et al., 1997) ⇒ Daniel Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. (1997). “Nymble: a High-performance Learning Name-finder.” In: Proceedings of Fifth Applied Natural Language Processing Conference (ANLC 1997). doi:10.3115/974557.974586
- NOTE: One of the earlier examples where learning was competitive with manually coded systems.