Text Token Window

From GM-RKB
Jump to navigation Jump to search

A text token window is a substring of a text token string that contains a target text token.



References

2010

2008

  • (Koehn, 2008) ⇒ Philipp Koehn. (2008). “Statistical Machine Translation." Cambridge University Press. ISBN:0521874157
    • … The task of determining the right word sense for a word in a given context is called word sense disambiguation. Research in this area has shown that the word context such as closely neighboring words and content words in a larger window are good indicators for word sense.

2001

1992

  • (Gale et al., 1992) ⇒ William A. Gale, Kenneth W. Church, and David Yarowsky (1992). “One Sense per Discourse.” In: Proceedings of the DARPA Speech and Natural Language Workshop.
    • Our word-sense disambiguation algorithm uses the words in a 100-word context surrounding the polysemous word very much like the other two applications use the words in a test document. ... It is common to use very small contexts (e.g., 5-words) based on the observation that people do not need very much context in order to performance the disambiguation task. In contrast, we use much larger contexts (e.g., 100 words). Although people may be able to make do with much less context, we believe the machine needs all the help it can get, and we have found that the larger context makes the task much easier. In fact, we have been able to measure information at extremely large distances (10,000 words away from the polysemous word in question), though obviously most of the useful information appears relatively near the polysemous word (e.g., within the first 100 words or so). Needless to say, our 100-word contexts are considerably larger than the smaller 5-word windows that one normally finds in the literature.

1986