2007 Wikify

(Redirected from Mihalcea & Csomai, 2007)
Jump to navigation Jump to search

Subject Headings: Entity Mention Normalization Algorithm, Wikipedia-based Word Mention Normalization Task, Term Identification Task, Word Sense Disambiguation Task, Wikify System.


Cited By



  • (Kulkarni et al., 2009) ⇒ Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti. (2009). “Collective Annotation of Wikipedia Entities in Web Text.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557073.
    • Wikify! 13 has two components. The first, keyword extraction, decides if a phrase should be linked to Wikipedia. This is based on how often a word or phrase is found to be in the anchor text of some link internal to Wikipedia. The second step is disambiguation. Wikify!, too, is conservative in flagging keywords, so much so that even random disambiguation results in an F1 score of 0.82. Suppose Wikify! is considering linking spot [math]\displaystyle{ s }[/math] to entity γ. Wikipedia’s page describing γ is explicitly referred from other Wikipedia pages. The context of these known citations is compared with the context of [math]\displaystyle{ s }[/math] to decide on a compatibility score. This may be regarded as generalizing SemTag, where known references to γ form part of the metadata of γ. … However, none of these systems attempt collective disambiguation across spots.




This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.


  • 1. S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in wikipedia. In: Proceedings of the EACL Workshop on New Text, Trento, Italy, 2006.
  • 2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 1(501), May 2001.
  • 3. R. Bunescu and Marius Paşca. Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the European Conference of the Association for Computational Linguistics, Trento, Italy, 2006..
  • 4. Sara Drenner, Max Harper, Dan Frankowski, John Riedl, Loren Terveen, Insert movie reference here: a system to bridge conversation and item-oriented web sites, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada doi:10.1145/1124772.1124914.
  • 5. Alexander Faaborg, Henry Lieberman, A goal-oriented web browser, Proceedings of the SIGCHI conference on Human Factors in computing systems, April 22-27, 2006, Montréal, Québec, Canada doi:10.1145/1124772.1124883
  • 6. Evgeniy Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), Boston, 2006.
  • 7. J. Giles. Internet encyclopaedias go head to head. Nature, 438(7070):900--901, 2005.
  • 8. Alfio Gliozzo, Claudio Giuliano, Carlo Strapparava, Domain kernels for word sense disambiguation, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.403-410, June 25-30, 2005, Ann Arbor, Michigan doi:10.3115/1219840.1219890
  • 9. Carl Gutwin, Gordon Paynter, Ian H. Witten, Craig Nevill-Manning, Eibe Frank, Improving browsing in digital libraries with keyphrase indexes, Decision Support Systems, v.27 n.1-2, p.81-104, Nov. 1999 doi:10.1016/S0167-9236(99)00038-X
  • 10. Anette Hulth, Improved automatic keyword extraction given more linguistic knowledge, Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, p.216-223, July 11, 2003 doi:10.3115/1119355.1119383
  • 11. C. Jacquemin and D. Bourigault. Term Extraction and Automatic Indexing. Oxford University Press, 2000.
  • 12. Yoong Keok Lee, and Hwee Tou Ng, An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, p.41-48, July 06, 2002 doi:10.3115/1118693.1118699.
  • 13. Michael Lesk, Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, Proceedings of the 5th annual International Conference on Systems documentation, p.24-26, June 1986, Toronto, Ontario, Canada doi:10.1145/318723.318728
  • 14. Henry Lieberman, Hugo Liu, Adaptive Linking between Text and Photos Using Common Sense Reasoning, Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, p.2-11, May 29-31, 2002
  • 15. Christopher D. Manning, Hinrich Schütze, Foundations of statistical natural language processing, MIT Press, Cambridge, MA, 1999
  • 16. Rada Mihalcea, Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.411-418, October 06-08, 2005, Vancouver, British Columbia, Canada doi:10.3115/1220575.1220627
  • 17. R. Mihalcea. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007]]: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, April 2007.
  • 18. R. Mihalcea and P. Edmonds, editors. Proceedings of SENSEVAL-3, Association for Computational Linguistics Workshop, Barcelona, Spain, 2004.
  • 19. R. Mihalcea and P. Tarau. TextRank - bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.. 20. George A. Miller, WordNet: a lexical database for English, Communications of the ACM, v.38 n.11, p.39-41, Nov. 1995 doi:10.1145/219717.219748
  • 21. R. Navigli and M. Lapata. Graph connectivity measures for unsupervised word sense disambiguation. In: Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007.
  • 22. Roberto Navigli, Paola Velardi, Structural Semantic Interconnections: A Knowledge-based Approach to Word Sense Disambiguation, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.27 n.7, p.1075-1086, July 2005 doi:10.1109/TPAMI.2005.149
  • 23. Hwee Tou Ng, Hian Beng Lee. (1996). Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach, Proceedings of the 34th annual meeting on Association for Computational Linguistics, p.40-47, June 24-27, 1996, Santa Cruz, California doi:10.3115/981863.981869
  • 24. Ted Pedersen, A decision tree of bigrams is an accurate predictor of word sense, Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, p.1-8, June 01-07, 2001, Pittsburgh, Pennsylvania doi:10.3115/1073336.1073347
  • 25. S. Pradhan, E. Loper, D. Dligach, and M. Palmer. Semeval-2007 task-17: English lexical sample, srl and all words. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, June 2007.
  • 26. Gerard Salton, Christopher Buckley, Term-weighting approaches in automatic text retrieval, Information Processing and Management: an International Journal, v.24 n.5, p.513-523, 1988 doi:10.1016/0306-4573(88)90021-0
  • 27. M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedeness using Wikipedia. In: Proceedings of the American Association for Artificial Intelligence, Boston, MA, 2006.
  • 28. Peter D. Turney, Learning Algorithms for Keyphrase Extraction, Information Retrieval, v.2 n.4, p.303-336, May 2000 doi:10.1023/A:1009976227802,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 WikifyRada Mihalcea
Andras Csomai
Wikify!: Linking documents to encyclopedic knowledgeProceedings of the Sixteenth ACM Conference on Information and Knowledge Managementhttp://www.cs.unt.edu/~rada/papers/mihalcea.cikm07.pdf10.1145/1321440.13214752007