Yarowsky Algorithm

Context:
- It is one of the first Bootstrapped Weakly-Supervised Learning Algorithm applied to Natural Language Processing Research.
- It can be applied to the Word Sense Disambiguation Task.
- It uses the One Sense Per Collocation Hypothesis.
- It uses the One Sense Per Discourse Hypothesis.
See: David Yarowsky.

References

http://en.wikipedia.org/wiki/Yarowsky_algorithm
- In computational linguistics the Yarowsky algorithm is an unsupervised learning algorithm for word sense disambiguation that uses the "one sense per collocation” and the "one sense per discourse" properties of human languages for word sense disambiguation. From observation, words tend to exhibit only one sense in most given discourse and in a given collocation.
- The algorithm starts with a large, untagged corpus, in which it identifies examples of the given polysemous word, and stores all the relevant sentences as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%–98% according to Yarowsky) remain untagged.
- …

(Abney, 2004) ⇒ Steven P. Abney. (2004). “Understanding the Yarowsky Algorithm." Computational Linguistics. 30(3)
- QUOTE: The Yarowsky algorithm (Yarowsky, 1995) was one of the first bootstrapping algorithms to become widely known in computational linguistics. The Yarowsky algorithm, in brief, consists of two loops. The “inner loop” or base learner is a supervised learning algorithm. Specifically, Yarowsky uses a simple decision list learner that considers rules of the form, “If instance x contains feature f, then predict label j,” and selects those rules whose precision on the training data is highest.
  The “outer loop” is given a seed set of rules to start with. In each iteration, it uses the current set of rules to assign labels to unlabeled data. It selects those instances on which the base learner’s predictions are most confident, and constructs a labeled training set from them. It then calls the inner loop to construct a new classifier (that is, a new set of rules), and the cycle repeats.