Entity Mention Coreference Resolution Task
An Entity Mention Coreference Resolution Task is a coreference resolution task that is also a mention clustering task (to produce coreferent mention clusters).
- AKA: Noun Phrase Coreference Resolution Task, Entity Mention Clustering Task, Entity Mention Disambiguation, Coreferent Entity Mention Resolution Task, Entity Reference Discrimination Task, Reference Clustering Task.
- Context:
- Input: A Document Set.
- Optionally: an Annotated Document Set (in which the entity mentions may have been recognized in advance).
- Optionally: an Entity Type Description, e.g. People.
- output: a set of Entity Mention Clusters (from the document set with the same Entity Referent). This is a Coreference Chain if only one document is provided.
- Optionally: Identifiers to the Artifact]]s that reference the entity.
- Optionally, the cluster can be mapped to a corresponding Entity Record in an Entity Database.
- It can range from being an Anaphora Resolution Task to being a Noun-Phrase Resolution Task.
- It can range from being a Cross-Document Coreference Resolution Task (if the entity mentions span across multiple documents)
to being a Single-Document Coreference Resolution Task (the the input is single document). - It can range from being a Heuristic Entity Mention Coreference Resolution Task to being a Data-Driven Entity Mention Coreference Resolution Task (such as a supervised entity mention coreference resolution task).
- It can be a subtask to the Information Fusion Task.
- It can be solved by an Entity Mention Coreference Resolution System that applies an (Entity Mention Coreference Resolution Algorithm.
- It is a Word-level Semantic Analysis Task.
- It can be supported by an Anaphora Resolution Task to identify Anaphors and map them to their Referent Entity Mentions.
- It can support an Entity Mention Normalization Task.
- …
- Input: A Document Set.
- Example(s):
- A Named Entity Coreference Resolution Task, such as person mention coreference resolution.
- a Anaphora Resolution Task,
- PPLRE Passage 8611.0-1 shows a more sophisticated requirement of unifying "class concepts", in this case "(vir) genes” with "vir genes".
- A Coreference Resolution Benchmark Task / Entity Mention Coreference Resolution Benchmark Task.
- SemEval-2007 Task-13 The challenge is to correctly estimate the number of referents and group documents referring to the same individual. http://nlp.cs.swarthmore.edu/semeval/tasks/task13/summary.shtml
- Counter-Example(s):
- See: Terminology Extraction Task.
References
2012
- http://en.wikipedia.org/wiki/Coreference
- QUOTE: In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same “referent."
For example, in the sentence "Mary said she would help me", "she" and "Mary" are most likely referring to the same person or group, in which case they are coreferent. Similarly, in "I saw Scott yesterday. He was fishing by the lake," Scott and he are most likely coreferent.
The pattern of these examples is typical: when first introducing a person or other topic for discussion, an author or speaker will use a relatively long or detailed description, such as a definite description as defined by Saul Kripke. However, later mentions are briefer. Once down to mere pronouns, references are frequently ambiguous. In the "Mary said she would help me" example, although the most likely reading is that "she" refers to Mary, "she" could instead refer to someone else (most likely someone introduced earlier in a dialog).
In computational linguistics, coreference resolution is a well-studied problem in discourse. In order to derive the correct interpretation of text, or even to estimate the relative importance of various mentioned subjects, pronouns and other referring expressions need to be connected to the right individuals.
When the reader must look back to the previous context, coreference is called “anaphoric reference”. When the reader must look forward, it is termed “cataphoric reference”.
- QUOTE: In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same “referent."
2010
- (Cheng, Lauw, & Paparizos, 2010) ⇒ Tao Cheng, Hady Lauw, and Stelios Paparizos. (2010). “Fuzzy Matching of Web Queries to Structured Data.” In: Proceedings of ICDE 2010 (ICDE 2010). doi:10.1109/ICDE.2010.5447817
- ABSTRACT: Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.
2009
- (Jurafsky & Martin, 2009) ⇒ Daniel Jurafsky, and James H. Martin. (2009). “Speech and Language Processing, 2nd edition." Pearson Education.
- QUOTE: In this passage, each of the underlined phrases is used by the speaker to denote one person named [Jane Doe]. We refer to this use of linguistic expression like her or Jane Doe to denote an entity or individual as reference*. In the next few sections of this chapter we study the problem of reference resolution. Reference resolution is the task of determining what entities are referred to by which linguistic expression.
A natural language express used to perform reference is called a referring expression, and the entity that is referred to is called the referent.
To referring expressions that are used to refer to the same entity are said to corefer. … There is also a term for a referring expression that license the use of anther, in the way that the mention of John allows John to be subsequently referred to as he. We call John the antecedent of he. Reference to an entity that has been previously introduced into the discourse is called a anaphora, and the referring expression use is to be anaphoric.
We are now ready to two referent resolution tasks: coreference resolution and pronominal anaphora resolution. Coreference resolution is the task of finding referring expression in a text that refer to the same entity, that is, finding expressions that corefer. We call the set of coreferring expressions a coreference chain.
Coreference resolution requires finding all referring expression in a discourse and group them into coreference chains. By contrast, pronomial anaphora resolution is the task of finding the antecedent for a single pronoun.
- QUOTE: In this passage, each of the underlined phrases is used by the speaker to denote one person named [Jane Doe]. We refer to this use of linguistic expression like her or Jane Doe to denote an entity or individual as reference*. In the next few sections of this chapter we study the problem of reference resolution. Reference resolution is the task of determining what entities are referred to by which linguistic expression.
2008
- (Clark and González-Brenes, 2008) ⇒ Jonathan H. Clark, and José P. González-Brenes. (2008). “Coreference: Current Trends and Future Directions." CMU course on Language and Statistics II Literature Review, Fall 2008.
- QUOTE: Coreference resolution seeks to find the mentions in text that refer to the same real-world entity. This task has been well-studied in NLP, but until recent years, empirical results have been disappointing. Recent research has greatly improved the state-of-the-art. In this review, we focus on five papers that represent the current state-of-the-art and discuss how they relate to each other and how these advances will influence future work in this area.
- (Artiles et al., 2008) ⇒ J. Artiles, Satoshi Sekine, and J. Gonzalo. (2008). “Web People Search: results of the first evaluation and the plan for the second.” In: Proceeding of the 17th International Conference on World Wide Web (WWW 2008).
2005
- (Bekkerman & McCallum, 2005) ⇒ Ron Bekkerman, and Andrew McCallum. (2005). “Disambiguating Web Appearance of People in a Social Network.” In: Proceedings of the 14th International World Wide Web Conference. (WWW 2005).
2004
- (Li et al., 2004) ⇒ Xin Li, Paul Morie, and Dan Roth. (2004). “Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches.” In: Proceedings of AAAI 2004.
2003
- (Mann and Yarowsky, 2003) ⇒ Gideon S. Mann, and David Yarowsky. (2003). “Unsupervised personal name disambiguation.” In: Proceedings of HLT-NAACL (2003).
2001
- (Soon et al., 2001) ⇒ Wee Meng Soon, Hwee Tou Ng, Daniel Chung Yong Lim. (2001). “A Machine Learning Approach to Coreference Resolution of Noun Phrases.” In: Computational Linguistics, 27(4). doi:10.1162/089120101753342653
- QUOTE: In this paper, we focus on the task of determining coreference relations as defined in MUC-6 (MUC-6 1995) and MUC-7 (MUC-7 1997). Specifically, an coreference relation denotes an identity of reference and holds between two textual elements known as markables, which can be definite noun phrases, demonstrative noun phrases, proper names, appositives, sub-noun phrases that act as modifiers, pronouns, and so on. Thus, our coreference task resolves general noun phrases and is not restricted to a certain type of noun phrase such as pronouns. Also, we do not place any restriction on the possible candidate markables; that is, all markables, whether they are “organization," “person," or other entity types, are considered. The ability to link coreferring noun phrases both within and across sentences is critical to discourse analysis and language understanding in general.