Leacock Chodorow Similarity Measure
A Leacock Chodorow Similarity Measure is a lexical semantic similarity measure that finds the shortest path between two concepts and scales that value by the maximum path length.
- Context:
- It can be a Path-based similarity that incorporates the depth of the taxonomy.
- Example(s):
- [math]\displaystyle{ Sim (W_i,W_j) = Max \Bigl[ \log 2D - \log Dist(c_i,c_j) \Bigl] }[/math], where [math]\displaystyle{ Dist(c_i,c_j) }[/math] is the shortest distance between concepts [math]\displaystyle{ c_i }[/math] and [math]\displaystyle{ c_j }[/math].
- $LCH \;Similarity =-\log \dfrac{spath (synset1, synset2) }{2 * Depth }$ where $spath$ is the shortest path between two concepts ($synset_1$ and $synset_2$) divided by twice the total depth of the taxonomy ($D$).
- $LCH\;Similarity = -\log \left(\dfrac{length}{2 * D}\right)$, where $length$ is the length of the shortest path between two synsets (using node-counting) and $D$ is the maximum depth of the taxonomy.
- …
- Counter-Example(s):
See: Semantic Similarity Measure, Semantic Similarity Score.
References
2021
- (Pedersen, 2021) ⇒ http://maraca.d.umn.edu/similarity/measures.html Retrieved:2021-03-06.
- QUOTE: The relatedness measure proposed by Leacock and Chodorow (lch) is -log (length / (2 * D)), where length is the length of the shortest path between the two synsets (using node-counting) and D is the maximum depth of the taxonomy.
The fact that the lch measure takes into account the depth of the taxonomy in which the synsets are found means that the behavior of the measure is profoundly affected by the presence or absence of a unique root node. If there is a unique root node, then there are only two taxonomies: one for nouns and one for verbs. All nouns, then, will be in the same taxonomy and all verbs will be in the same taxonomy. D for the noun taxonomy will be somewhere around 18, depending upon the version of WordNet, and for verbs, it will be 14. If the root node is not being used, however, then there are nine different noun taxonomies and over 560 different verb taxonomies, each with a different value for D.
If the root node is not being used, then it is possible for synsets to belong to more than one taxonomy. For example, the synset containing turtledove#n#2 belongs to two taxonomies: one rooted at group#n#1 and one rooted at entity#n#1. In such a case, the relatedness is computed by finding the LCS that results in the shortest path between the synsets. The value of D, then, is the maximum depth of the taxonomy in which the LCS is found. If the LCS belongs to more than one taxonomy, then the taxonomy with the greatest maximum depth is selected (i.e., the largest value for D).
- QUOTE: The relatedness measure proposed by Leacock and Chodorow (lch) is -log (length / (2 * D)), where length is the length of the shortest path between the two synsets (using node-counting) and D is the maximum depth of the taxonomy.
2019
- (GeeksforGeeks, 2019) ⇒ https://www.geeksforgeeks.org/nlp-leacock-chordorow-lch-and-path-similarity-for-synset/
- QUOTE: Path-based Similarity: It is a similarity measure that finds the distance that is the length of the shortest path between two synsets.
Leacock Chordorow (LCH) : It is a similarity measure which is an extended version of Path-based similarity as it incorporates the depth of the taxonomy. Therefore, it is the negative log of the shortest path ($spath$) between two concepts ($synset_1$ and $synset_2$) divided by twice the total depth of the taxonomy ($D$) as defined in fig below.
$LCH \;Similarity =-\log \dfrac{spath (synset1, synset2) }{2 * Depth }}
- QUOTE: Path-based Similarity: It is a similarity measure that finds the distance that is the length of the shortest path between two synsets.
2011
- (NLTK - WordNetCorpusReader Module, 2011-Jun-19) ⇒ http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet.WordNetCorpusReader-class.html
- QUOTE: Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur.
2004
- (Pedersen et al., 2004) ⇒ Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi (2004, July). "WordNet:: Similarity-Measuring the Relatedness of Concepts". In: AAAI (Vol. 4, pp. 25-29).
- QUOTE: Three similarity measures are based on path lengths between concepts: lch (Leacock & Chodorow 1998), wup (Wu & Palmer 1994), and path. The lch measure finds the shortest path between two concepts, and scales that value by the maximum path length in the is – a hierarchy in which they occur. wup finds the path length to the root node from the least common subsumer (LCS) of the two concepts, which is the most specific concept they share as an ancestor. This value is scaled by the sum of the path lengths from the individual concepts to the root. The measure path is equal to the inverse of the shortest path length between two concepts.
1998
- (Leacock & Chodorow, 1998) ⇒ Claudia Leacock, and Martin Chodorow. (1998). “Combining local context and WordNet similarity for word sense identification”. In: WordNet: An electronic lexical database, 49(2). DOI:10.7551/mitpress/7287.003.0018.