Named Entity Recognition (NER) Task
A Named Entity Recognition (NER) Task is an entity mention recognition task that is restricted to the detection and classification of named entity mentions and their entity class.
- Context:
- Input: a Text Item, and a Class Set.
- output: an Annotated Text Item (with labeled named entity mention).
- optional: a Named Entity Recognition Model (that can be used in future tasks without an entity type description).
- metrics: an NER Performance Metric, such as:
- It can range from being a Heuristic NER Task to being a Data-Driven NER Task (such as supervised NER).
- It can range from being an Automated NER Task to being a Manual NER Task.
- It can range from being a Domain Specific Named Entity Recognition Task (such as a protein mention recognition) to being a Domain-Free Named Entity Recognition Task.
- It can (typically) be restricted to certain Entity Types (e.g. protein NER task, person NER task).
- It can be solved by a Named Entity Recognition System (that applies an NER algorithm).
- It can range from being a Word-level Semantic Analysis Task to being a Phrase-level Semantic Analysis Task.
- It can be supported by a Named Entity Mention Detection Task and/or a Named Entity Mention Classification Task.
- It can support tasks such as: Text Understanding Task, ...
- It can (often) precede: a Named Entity Coreference Resolution Task, a Relation Recognition Task, a Question Answering Task, and others.
- Example(s):
- a Named Entity Recognition Benchmark Task, such as:
- CoNLL-2002 Benchmark Task and CoNLL-2003 Benchmark Task that use British newswire corpus in multiple languages (Spanish, Dutch, English, German) for 4 entities: Person, Location, Organization, Misc.
- MUC-6 Task and MUC-7 Task using the American newswire corpus for 7 entity mention types: Person, Location, Organization, Time, Date, Percent, Money.
- ACE NER Task for 5 entity mention types: Location, Organization, Person, FAC, and GPE.
- BBN (Penn Treebank) for 22 entity mention types: Animal, Cardinal, Date, Disease, …
- a Named Entity Recognition Benchmark Task, such as:
- Counter-Example(s):
- Entity Mention Detection Task, such as:
[[EMDT]]("Felix is a mammal.”) ⇒ "[Felix] is a [mammal]."
- an Entity Mention Recognition Task, such as
[[EMRT]]("Alexander has a unicycle.”) ⇒ "A [THING|unicycle] has one [THING|wheel]."
- a Coreference Resolution Task (which detects whether two entity mentions refer to the same entity).
- a Semantic Relation Recognition Task.
- an Information Extraction Task.
- a Person Face Recognition Task.
- Entity Mention Detection Task, such as:
- See: NER Corpus, Annotate.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Named-entity_recognition Retrieved:2020-3-1.
- Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Most research on NER systems has been structured as taking an unannotated block of text, such as this one:
Jim bought 300 shares of Acme Corp. in 2006.
And producing an annotated block of text that highlights the names of entities:
[Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.
In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.
State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%. [1] [2]
- Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
2011
- (Liu et al., 2011) ⇒ Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. (2011). “Recognizing Named Entities in Tweets.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: Named Entities Recognition (NER) is generally understood as the task of identifying mentions of rigid designators from text belonging to named-entity types such as persons, organizations and locations (Nadeau and Sekine, 2007).
2010
- (Alias-i, 2010) ⇒ Alias-i (2010) ⇒ "Named Entity Tutorial".
- QUOTE: Named entity recognition (NER) is the process of finding mentions of specified things in running text.
2009
- Stanford Named Entity Recognizer System http://nlp.stanford.edu/software/CRF-NER.shtml
- QUOTE: Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names.
2008a
- (Olsson, 2008) ⇒ Fredrik Olsson. (2008). “Bootstrapping Named Entity Annotation by Means of Active Machine Learning." PhD thesis. University of Gothenburg.
2008b
- (Settles, 2008) ⇒ Burr Settles. (2008). “Curious Machines: Active Learning with Structured Instances." PhD.
- QUOTE: Named entity recognition (NER) is a subtask of information extraction, focused on finding mentions of various entities that belong to semantic classes of interest. In the biomedical domain, entities of interest are usually references to genes, proteins, cell types, and the like.
2008c
- (Sarawagi, 2008) ⇒ Sunita Sarawagi. (2008). “Information extraction.” In: Foundations and Trends ® in Databases. DOI: 10.1561/1500000003
- QUOTE:The most popular form of entities is named entities like names of persons, locations, and companies as popularized in the MUC [3] [4], ACE [5][6], and CoNLL [7] competitions. Named entity recognition was first introduced in the sixth MUC and consisted of three subtasks: proper names and acronyms of persons, locations, and organizations (ENAMEX), absolute temporal terms (TIMEX) and monetary and other numeric expressions (NUMEX). Now the term entities is expanded to also include generics like disease names, protein names, paper titles, and journal names. The ACE competition for entity relationship extraction from natural language text lists more than 100 different entity types.
- ↑ Elaine Marsh, Dennis Perzanowski, "MUC-7 Evaluation of IE Technology: Overview of Results", 29 April 1998 PDF
- ↑ MUC-07 Proceedings (Named Entity Tasks)
- ↑ (Chinchor, 1998) ⇒ Nancy A. Chinchor (1998). "Overview of muc-7/met-2". In: Science Applications International Corporation.
- ↑ (Grishman & Sundheim, 1996) ⇒ (1996). "Message Understanding Conference-6: A Brief History". In: Proceedings of The 16th International Conference on Computational Linguistics (COLING 1996 Volume 1).
- ↑ ACE, F. (2004). Annotation Guidelines for Entity Detection and Tracking (EDT).
- ↑ NIST (1998–2008). Automatic content extraction (ACE) program.
- ↑ (Sang & De Meulder, 2003) ⇒ Erik F. Tjong Kim Sang, and Fien De Meulder (2003). "Introduction To The Conll-2003 Shared Task: Language-Independent Named Entity Recognition". In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (CoNLL 2003).
2007a
- (Nadeau & Sekine, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification.” In: Lingvisticae Investigationes, 30(1).
- QUOTE: The term “Named Entity", now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called Named Entity Recognition and Classification (NERC)”.
In the expression “Named Entity”, the word “Named” aims to restrict the task to only those entities for which one or many rigid designators, as defined by S. Kripke (1982), stands for the referent. For instance, the automotive company created by Henry Ford in 1903 is referred to as Ford or Ford Motor Company. Rigid designators include proper names as well as certain natural kind terms like biological species and substances. There is a general agreement in the NERC community about the inclusion of temporal expressions and some numerical expressions such as amounts of money and other types of units.
Early work formulates the NERC problem as recognizing “proper names” in general (e.g., S. Coates-Stephens 1992, C. Thielen 1995). Overall, the most studied types are three specializations of “proper names”: names of “persons”, “locations” and “organizations”. These types are collectively known as “enamex” since the MUC-6 competition.
- QUOTE: The term “Named Entity", now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called Named Entity Recognition and Classification (NERC)”.
2007b
- (Sutton & McCallum, 2007) ⇒ Charles Sutton, and Andrew McCallum. (2007). “An Introduction to Conditional Random Fields for Relational Learning.” In: (Getoor & Taskar, 2007).
- QUOTE: NER is the problem of identifying and classifying proper names in text, including locations, such as China ; people, such as George Bush ; and organizations, such as the United Nations. The named-entity recognition task is, given a sentence, first to segment which words are part of entities, and then to classify each entity by type (person, organization, location, and so on). The challenge of this problem is that many named entities are too rare to appear even in a large training set, and therefore the system must identify them based only on context.
2005
- (Huang, 2005) ⇒ Fei Huang. (2005). “Multilingual Named Entity Extraction and Translation from Text and Speech." PhD Thesis. Carnegie Mellon University. CMU-LTI-06-001
- Named entity recognition (NER), also known as NE extraction, NE detection, NE tagging or NE identification, is to recognize structured information, such as proper names (person, location and organization), time (date and time) and numerical values (currency and percentage) from natural language text. It is one of the first IE tasks to be researched. Many NER systems based on pattern-matching rules or statistical models achieved satisfactory performances on well-formed text. Based on the 1997 MUC-7/MET-2 evaluation, NE recognition systems have achieved 94% F score on English newswire text and 85%-91% on Chinese text, 87%-93% on Japanese text.
2004
- (Cohen & Sarawagi, 2005) ⇒ William W. Cohen, and Sunita Sarawagi. (2004). “Exploiting Dictionaries in Named Entity Extraction: Combining semi-Markov extraction processes and data integration methods.” In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004)
2003
- (Grishman, 2003) ⇒ Ralph Grishman. (2003). “Information Extraction.” In: * (Mitkov, 2003).
- QUOTE: In conventional treatments of language structure, little attention is paid to proper names, addresses, quantity phrases, etc. Presentations of language analysis typically begin by looking words up in a dictionary and identifying them as noun, verbs, adjectives, etc. In fact, however, most tests include lots of names, and if a system cannot identify these as linguistic units (and, for most tasks, identify their type), it will be hard pressed to produce a linguistic analysis of the text.
2002
- (Tjong Kim Sang, 2002) ⇒ Erik F. Tjong Kim Sang. (2002). “Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition.” In: Proceedings of CoNLL-2002.
1999
- (Mikheev et al., 1999) ⇒ Andrei Mikheev, Marc Moens, and Claire Grover. (1999). “Named Entity Recognition Without Gazetteers.” In: Proceedings of the Nineth Conference on European Chapter of the Association for Computational Linguistics. doi:10.3115/977035.977037
- QUOTE: It is often claimed that Named Entity recognition systems need extensive gazetteers---lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition (...)
1996
- (Grishman & Sundheim, 1996) ⇒ Ralph Grishman, and Beth Sundheim . (1996). “Message Understanding Conference - 6: A Brief History.” In: Proceedings of COLING Conference (COLING 1996).
- NOTE: Reviews one of the more commonly evaluated NER datasets: MUC-6.
1980
- (Kripkey, 1980) ⇒ Saul Kripke. (1980). “Naming and Necessity." Harvard University Press.
- QUOTE: Naming and Necessity is a 1980 book with the transcript of three lectures, given by the philosopher Saul Kripke, at Princeton University in 1970, in which he dealt with the debates of proper names in the philosophy of language.[1] The transcript was brought out originally in 1972 in Semantics of Natural Language, edited by Donald Davidson and Gilbert Harman.[2] Among analytic philosophers, Naming and Necessity is widely considered one of the most important philosophical works of the twentieth century.[3]
- ↑ Kripke, Saul. 1980. Naming and Necessity. Harvard University Press: 22.
- ↑ Davidson, D.; Harman, Gilbert (2012-12-06) (in en). Semantics of Natural Language. Springer Science & Business Media. ISBN 9789401025577. https://books.google.com/books?hl=en&lr=&id=vsb-CAAAQBAJ&oi=fnd&pg=PA1#v=onepage&q=1972&f=false.
- ↑ Soames, Scott. 2005. Philosophical Analysis in the Twentieth Century: Volume 2: The Age of Meaning. Princeton University Press. Cited in Byrne, Alex and Hall, Ned. 2004. 'Necessary Truths'. Boston Review October/November 2004.