Named Entity Recognition (NER) Task

From GM-RKB
Jump to navigation Jump to search

A Named Entity Recognition (NER) Task is an entity mention recognition task that is restricted to the detection and classification of named entity mentions and their entity class.



References

2020

  • (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Named-entity_recognition Retrieved:2020-3-1.
    • Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

      Most research on NER systems has been structured as taking an unannotated block of text, such as this one:

      Jim bought 300 shares of Acme Corp. in 2006.

      And producing an annotated block of text that highlights the names of entities:

      [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.

      In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.

      State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%. [1] [2]

2011

2010

2009

2008a

2008b

2008c

  1. Elaine Marsh, Dennis Perzanowski, "MUC-7 Evaluation of IE Technology: Overview of Results", 29 April 1998 PDF
  2. MUC-07 Proceedings (Named Entity Tasks)
  3. (Chinchor, 1998) ⇒ Nancy A. Chinchor (1998). "Overview of muc-7/met-2". In: Science Applications International Corporation.
  4. (Grishman & Sundheim, 1996) ⇒ (1996). "Message Understanding Conference-6: A Brief History". In: Proceedings of The 16th International Conference on Computational Linguistics (COLING 1996 Volume 1).
  5. ACE, F. (2004). Annotation Guidelines for Entity Detection and Tracking (EDT).
  6. NIST (1998–2008). Automatic content extraction (ACE) program.
  7. (Sang & De Meulder, 2003) ⇒ Erik F. Tjong Kim Sang, and Fien De Meulder (2003). "Introduction To The Conll-2003 Shared Task: Language-Independent Named Entity Recognition". In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (CoNLL 2003).

2007a

  • (Nadeau & Sekine, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification.” In: Lingvisticae Investigationes, 30(1).
    • QUOTE: The term “Named Entity", now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called Named Entity Recognition and Classification (NERC)”.

      In the expression “Named Entity”, the word “Named” aims to restrict the task to only those entities for which one or many rigid designators, as defined by S. Kripke (1982), stands for the referent. For instance, the automotive company created by Henry Ford in 1903 is referred to as Ford or Ford Motor Company. Rigid designators include proper names as well as certain natural kind terms like biological species and substances. There is a general agreement in the NERC community about the inclusion of temporal expressions and some numerical expressions such as amounts of money and other types of units.

      Early work formulates the NERC problem as recognizing “proper names” in general (e.g., S. Coates-Stephens 1992, C. Thielen 1995). Overall, the most studied types are three specializations of “proper names”: names of “persons”, “locations” and “organizations”. These types are collectively known as “enamex” since the MUC-6 competition.

2007b

2005

2004

2003

  • (Grishman, 2003) ⇒ Ralph Grishman. (2003). “Information Extraction.” In: * (Mitkov, 2003).
    • QUOTE: In conventional treatments of language structure, little attention is paid to proper names, addresses, quantity phrases, etc. Presentations of language analysis typically begin by looking words up in a dictionary and identifying them as noun, verbs, adjectives, etc. In fact, however, most tests include lots of names, and if a system cannot identify these as linguistic units (and, for most tasks, identify their type), it will be hard pressed to produce a linguistic analysis of the text.

2002

1999

1996

1980

  1. Kripke, Saul. 1980. Naming and Necessity. Harvard University Press: 22.
  2. Davidson, D.; Harman, Gilbert (2012-12-06) (in en). Semantics of Natural Language. Springer Science & Business Media. ISBN 9789401025577. https://books.google.com/books?hl=en&lr=&id=vsb-CAAAQBAJ&oi=fnd&pg=PA1#v=onepage&q=1972&f=false. 
  3. Soames, Scott. 2005. Philosophical Analysis in the Twentieth Century: Volume 2: The Age of Meaning. Princeton University Press. Cited in Byrne, Alex and Hall, Ned. 2004. 'Necessary Truths'. Boston Review October/November 2004.