Lesk Algorithm
Jump to navigation
Jump to search
A Lesk algorithm is an unsupervised word sense disambiguation algorithm that resembles the one proposed in (Lesk, 1986).
- Context:
- It requires a sense inventory with word sense definitions.
- It compares the overlap between a word's text window and the words in the word sense definition.
- It selects the word sense with the largest overlap.
- It can be extended by expanding the words that will be compared. For example, by means of including similar words in WordNet. (Banerjee & Pedersen, 2002)
- See: Supervised Word Sense Disambiguation Algorithm.
References
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Lesk_algorithm
- The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in (1986). The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A naive implementation of the The Lesk algorithm would be:
- 1. choosing pairs of ambiguous words within a neighbourhood
- 2. checks their definitions in a dictionary
- 3. choose the senses as to maximise the number of common terms in the definitions of the chosen words.
- Accuracy on Pride and Prejudice and selected papers of the Associated Press was found to be in the 50% to 70% range.
- A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood.
- Versions have been adapted to Wordnet.
- The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in (1986). The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A naive implementation of the The Lesk algorithm would be:
- (Gentile et al., 2009) ⇒ Anna L. Gentile, Pierpaolo Basile, and Giovanni Semeraro. (2009). “WibNED Wikipedia Based Named Entity Disambiguation.” In: Proceedings of the 5th Italian Research Conference on Digital Libraries (IRCDL 2009).
- Keywords_: Entity Mention Normalization Algorithm, Lesk Algorithm, (Wikipedia, 2009), Yamcha System.
2004
- (Ekehadl & Golub, 2004) ⇒ Jonas Ekedahl, and Koraljka Golub. (2004). “Word Sense Disambiguation Using WordNet and the Lesk Algorithm." Class Report. Language Processing and Computational Linguistics. Institutionen för datavetenskap.
2003
- (Patwardhan et al., 2003) ⇒ Siddharth Patwardhan, Satanjeev Banerjee, and Ted Pedersen. (2003). “Using Measures of Semantic Relatedness for Word Sense Disambiguation.” In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2003).
- QUOTE: The original Lesk algorithm [9] disambiguates a target word by comparing its gloss with those of its surrounding words. The target word is assigned the sense whose gloss has the most overlapping or shared words with the glosses of its neighboring words.
- There are two hypotheses that underly this approach. The first is that words that appear together in a sentence can be disambiguated by assigning to them the senses that are most closely related to their neighboring words. This follows from the intuition that words that appear together in a sentence must inevitably be related in some way, since they are normally working together to communicate some idea. The second hypothesis is that related senses can be identified by finding overlapping words in their definitions. The intuition here is equally reasonable, in that words that are related will often be defined using the same words, and in fact may refer to each other in their definitions.
- The main limitation to this approach is that dictionary glosses are often quite brief, and may not include sufficient vocabulary to identify related senses. Banerjee and Pedersen suggest an adaptation based on the use of WordNet.
2002
- (Banerjee & Pedersen, 2002) ⇒ Satanjeev Banerjee, and Ted Pedersen. (2002). “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet.” In: Proceedings of the Third Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2002). doi:10.1007/3-540-45715-1_11
- The original Lesk algorithm 3 disambiguates words in short phrases. The definition, or gloss, of each sense of a word in a phrase is compared to the glosses of every other word in the phrase. A word is assigned the sense whose gloss shares the largest number of words in common with the glosses of the other words. For example, in time flies like an arrow, the algorithm compares the glosses of time to all the glosses of fly and arrow. Next it compares the glosses of fly with those of time and arrow, and so on. The algorithm begins anew for each word and does not utilize the senses it previously assigned.
- The original Lesk algorithm relies on glosses found in traditional dictionaries such as Oxford Advanced Learner’s. We modify Lesk’s basic approach to take advantage of the highly inter–connected set of relations among synonyms that WordNet offers. While Lesk’s algorithm restricts its comparisons to the glosses of the words being disambiguated, our approach is able to compare the glosses of words that are related to the words to be disambiguated. This provides a richer source of information and improves overall disambiguation accuracy. We also introduce a novel scoring mechanism that weighs longer sequences of matches more heavily than single words.
1986
- (Lesk, 1986) ⇒ Michael E. Lesk. (1986). “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone." Paper in: Proceedings of SIGDOC-1986.