2008 AnEffectiveLowCostMeasure

(Milne & Witten, 2008b) ⇒ David N. Milne, Ian H. Witten. (2008). “An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links.” In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008).

Subject Headings: Semantic Relatedness Measure.

Notes

It makes use of the Topic Similarity Measure described in (Cilibrasi & Vitanyi, 2007).
- The Semantic Similarity Measure has Similarity Values that range from Zero (highest similarity) to Infinity (highest dissimilarity).

Cited By

~39 http://scholar.google.com/scholar?cites=7116897056466186599
(Milne & Witten, 2008a) ⇒ David N. Milne, and Ian H. Witten. (2008). “Learning to Link with Wikipedia.” In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, (CIKM 2008). doi:10.1145/1458082.1458150

Quotes

Abstract

This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter.

Related Work

The purpose of semantic relatedness measures is to allow computers to reason about written text. They have many applications in natural language processing and artificial intelligence (Budanitsky, 1999), and have consequently received a lot of attention from the research community. Table 1 shows the performance of various semantic relatedness measures according to their correlation with a manually defined ground truth; namely Finkelstein et al’s (2002) WordSimilarity-353 collection.

Measuring relatedness between articles

The second measure we use is modeled after the Normalized Google Distance (Cilibrasi and Vitanyi, 2007), which is based on term occurrences on web-pages. The name stems from the use of the Google search engine to obtain pages which mention the terms of interest. Pages that contain both terms indicate relatedness, while pages with only one of the terms suggest the opposite. Our measure is based on Wikipedia’s links rather than Google’s search results. Formally, the measure is: …

... where a and b are the two articles of interest, A and B are the sets of all articles that link to a and b respectively, and — as before — W is the entire Wikipedia.

References

(Cilibrasi & Vitanyi, 2007) ⇒ R. L. Cilibrasi and P. M. B. Vitanyi. (2007). “The Google Similarity Distance.” In: IEEE Transactions on Knowledge and Data Engineering 19(3). doi:10.1109/TKDE.2007.48,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 AnEffectiveLowCostMeasure	David N. Milne Ian H. Witten			An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links			http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf