Coreference Clustering Task

AKA: Coreference Resolution.
Context:
- input: A Referencer Set (with some reference information).
  - optional: Background Knowledge, such as a canonical entity database.
- output: A set of Coreference Clusters.
- It can range from (typically) being an Entity Reference Clustering Task to being a Relation Reference Clustering Task.
- It can be solved by a Coreference Resolution System that implements a Coreference Resolution Algorithm.
- It can range from being a Heuristic Coreference Resolution Task to being a Data-Driven Coreference Resolution Task.
- It can support a Reference Grounding Task.
Example(s):
- a Coreferent Mention Resolution Tasks, such as a person mention coreference resolution task.
- a Record Coreference Resolution Tasks, such as a Person Record Coreference Resolution Task.
- an Ontology Merging Task.
Counter-Example(s):
- a Referencer Classification Task.
- a Reference Grounding Task (to a Canonical Referencer).
See: Coreference Chain, Coreferential Expression Set, Ontology, Word Mention Clustering Task.

References

http://cogcomp.cs.illinois.edu/page/demos/
- QUOTE: A given entity - representing a person, a location, or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language and supporting intelligent access to textual information requires identifying whether different entity mentions are actually referencing the same entity. The Coreference Resolution Demo processes unannotated text, detecting mentions of entities and showing which mentions are coreferential.

(Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. (2009). "An Entity Based Model for Coreference Resolution." In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
- QUOTE: Coreference resolution is the problem of clustering mentions (or records) into sets referring to the same underlying entity (e.g., person, places, organizations). Over the past several years, increasingly powerful supervised machine learning techniques have been developed to solve this problem. Initial solutions treated it as a set of independent binary classifications, one for each pair of mentions [1, 2]. Next, relational probability models were developed to capture the dependency between each of these classifications [3, 4]; however the parameterization of these methods still consists of features over pairs of mentions. Finally, methods have been developed to enable arbitrary features over entire clusters of mentions [5, 6, 7].

(Pasula, 2006) ⇒ Hanna Pasula. (2006). "Approximate Inference Techniques for Identity Uncertainty." Lecture
- QUOTE: Many interesting tasks, such as vehicle tracking, data association, and mapping, involve reasoning about the objects present in a domain. However, the observations on which this reasoning is to be based frequently fail to explicitly describe these objects' identities, properties, or even their number, and may in addition be noisy or nondeterministic. When this is the case, identifying the set of objects present becomes an important aspect of the whole task.