Disambiguation to Wikipedia (D2W) Task

A Disambiguation to Wikipedia (D2W) Task is a Text Wikification Task whose input is a Wikipedia snapshot.

AKA: Wikipedia-directed Wikification Task.
Context:
- It can be solved by a Disambiguation to Wikipedia (D2W) System (that implements a Disambiguation to Wikipedia (D2W) Algorithm).
Example(s):
- a GLOW Task (Ratinov et al., 2011),
- a TAGME Wikification Task (Ferragina & Scaiella, 2010),
- a Wikify Task (Mihalcea & Csomai, 2007).
- …
Counter-Example(s):
- GMRKB-directed Wikification Task,
- WikiM Wikification Task.
See: Document to Ontology Interlinking System, Natural Language Processing System, Wikimedia, Wikitext, GM-RKB WikiFixer System.

References

2019

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Entity_linking Retrieved:2019-6-15.
- In natural language processing, entity linking, named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN)^[1] is the task of determining the identity of entities mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred as "Paris". NED is different from named entity recognition (NER) in that NER identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is.
  Entity linking requires a knowledge base containing the entities to which entity mentions can be linked. A popular choice for entity linking on open domain text are knowledge-bases based on Wikipedia, ^[2] in which each page is regarded as a named entity. NED using Wikipedia entities has been also called wikification (see Wikify! an early entity linking system^[3] ). A knowledge base may also be induced automatically from training text ^[4] or manually built. (...)

↑ M. A. Khalid, V. Jijkoun and M. de Rijke (2008). The impact of named entity normalization on information retrieval for question answering. Proc. ECIR.
↑ Xianpei Han, Le Sun and Jun Zhao (2011). Collective entity linking in web text: a graph-based method. Proc. SIGIR.
↑ Rada Mihalcea and Andras Csomai (2007)Wikify! Linking Documents to Encyclopedic Knowledge. Proc. CIKM.
↑ Aaron M. Cohen (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. Proc. ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24.

2016

(Tsai & Roth, 2016) ⇒ Chen-Tse Tsai, and Dan Roth. (2016). “Cross-lingual Wikification Using Multilingual Embeddings.” In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- QUOTE: ... Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wikipedia titles. …

2014

(Roth et al., 2014) ⇒ Dan Roth, Heng Ji, Ming-Wei Chang, and Taylor Cassidy. (2014). “Wikification and Beyond: The Challenges of Entity and Concept Grounding.” Tutorial at ACL 2014.
- QUOTE: Contextual disambiguation and grounding of concepts and entities in natural language text are essential to moving forward in many natural language understanding related tasks and are fundamental to many applications. The Wikification task (Bunescu and Pasca, 2006; Mihalcea and Csomai, 2007; Ratinov et al., 2011) aims at automatically identifying concept mentions appearing in a text document and link it to (or “ground it in”) a concept referent in a knowledge base (KB) (e.g., Wikipedia). For example, consider the sentence, "The Times report on Blumental (D) has the potential to fundamentally reshape the contest in the Nutmeg State.", a Wikifier should identify the key entities and concepts (Times, Blumental, D and the Nutmeg State), and disambiguate them by mapping them to an encyclopedic resource revealing, for example, that “D” here represents the Democratic Party, and that “the Nutmeg State” refers Connecticut.

2013

(Cheng & Roth, 2013) ⇒ Xiao Cheng, and Dan Roth. (2013). “Relational Inference for Wikification.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- QUOTE: Wikification (D2W), the task of identifying concepts and entities in text and disambiguating them into their corresponding Wikipedia page(...)
  Given a document D containing a set of concept and entity mentions M (referred to later as surface), the goal of Wikification is to find the most accurate mapping from mentions to Wikipedia titles T; this mapping needs to take into account our understanding of the text as well as background knowledge that is often needed to determine the most appropriate title. We also allow a special NIL title that captures all mentions that are outside Wikipedia.

2012

(Cassidy et al., 2012) ⇒ Taylor Cassidy, Heng Ji, Lev-Arie Ratinov, Arkaitz Zubiaga, and Hongzhao Huang. (2012). “Analysis and Enhancement of Wikification for Microblogs with Context Expansion.” In: Proceedings of COLING-2012 (COLING-2012).
- QUOTE: Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia entries. Most previous work has focused on linking terms in formal texts (e.g. newswire) to Wikipedia. Linking terms in short informal texts (e.g. tweets) is difficult for systems and humans alike as they lack a rich disambiguation context.

2011

(Ratinov et al., 2011) ⇒ Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. (2011). “Local and Global Algorithms for Disambiguation to Wikipedia.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. ISBN:978-1-932432-87-9 .
- QUOTE: Wikification is the task of identifying and linking expressions in text to their referent Wikipedia pages(...) Previous studies on Wikification differ with respect to the corpora they address and the subset of expressions they attempt to link. For example, some studies focus on linking only named entities, whereas others attempt to link all “interesting” expressions, mimicking the link structure found in Wikipedia. Regardless, all Wikification systems are faced with a key Disambiguation to Wikipedia (D2W) task. In the D2W task, we’re given a text along with explicitly identified substrings (called mentions) to disambiguate, and the goal is to output the corresponding Wikipedia page, if any, for each mention. For example, given the input sentence “I am visiting friends in <Chicago>,” we output http://en.wikipedia.org/wiki/Chicago – the Wikipedia page for the city of Chicago, Illinois, and not (for example) the page for the 2002 film of the same name.

2008

(Csomai & Mihalcea, 2008) ⇒ Andras Csomai, and Rada Mihalcea. (2008). “Linking Documents to Encyclopedic Knowledge.” In: IEEE Intelligent Systems 23(5). doi:10.1109/MIS.2008.86

2007

(Mihalcea & Csomai, 2007) ⇒ Rada Mihalcea, and Andras Csomai. (2007). “Wikify!: Linking documents to encyclopedic knowledge.” In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. doi:10.1145/1321440.1321475
- QUOTE: Automatic text wikification implies solutions for the two main tasks performed by a Wikipedia contributor when adding links to an article: (1) keyword extraction, and (2) link disambiguation.
  The first task consists of identifying those words and phrases that are considered important for the document at hand. These typically include technical terms, named entities, new terminology, as well as other concepts closely related to the content of the article – in general, all the words and phrases that will add to the reader’s experience. For instance, the Wikipedia page for “tree” includes the text “A tree is a large, perennial, woody plant [...] The earliest trees were tree ferns and horsetails, which grew in forests in the Carboniferous Period.”, where perennial, plant, tree ferns, horsetails, and Carboniferous are selected as keywords. This task is identified with the problem of keyword extraction, which targets the automatic identification of important words and phrases in an input natural language text.
  The second task consists of finding the correct Wikipedia article that should be linked to a candidate keyword. Here, we face the problem of link ambiguity, meaning that a phrase can be usually linked to more than one Wikipedia page, and the correct interpretation of the phrase (and correspondingly the correct link) depends on the context where it occurs. For instance, the word “plant” can be linked to different articles, depending on whether it was used with its green plant or industrial plant meaning. This task is analogous to the problem of word sense disambiguation, aiming at finding the correct sense of a word according to a given sense inventory.

[khalid2008-1] M. A. Khalid, V. Jijkoun and M. de Rijke (2008). The impact of named entity normalization on information retrieval for question answering. Proc. ECIR.

[2] Xianpei Han, Le Sun and Jun Zhao (2011). Collective entity linking in web text: a graph-based method. Proc. SIGIR.

[wikify-3] Rada Mihalcea and Andras Csomai (2007)Wikify! Linking Documents to Encyclopedic Knowledge. Proc. CIKM.

[4] Aaron M. Cohen (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. Proc. ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24.

[1]

[2]

[3]

[4]

Disambiguation to Wikipedia (D2W) Task

References

2019

2016

2014

2013

2012

2011

2008

2007

Navigation menu

Search