Text Token Window

AKA: Textual Context.
Context:
- It can (typically) have a predefined Text Window Size.
- It can range from being a Word Mention Context Window to being a Orthographic Text Token Window.
- It can contain Reference Information.
- It can be used to create n-Grams (and skip-grams).
Example(s):
- an Orthographic Text Window, such as:
  - [math]\displaystyle{ f }[/math]((This/DT; sentence/NN; has/AUX; seven/CD; orthographic/CD; tokens/NNS; ./.), target=5, radius=1) ⇒ (six/CD; orthographic/CD; tokens/NNS).
  - [math]\displaystyle{ f }[/math]((Zimbabwe; Air; kicked; the; bucket; twenty; four; years; ago; .), target=3, radius=1) ⇒ (Air; kicked; the).
- an Lexical Text Window, such as:
  - [math]\displaystyle{ f }[/math]((Zimbabwe_Air; kicked_the_bucket; twenty_four; years; ago; .), radius=1, target=2) ⇒ (Zimbabwe Air; kicked the bucket; twenty four).
- …
Counter-Example(s):
- Sentence Context Window,
See: Word-Text Window Co-Occurrence Relation, Lexical Distributional Semantic Heuristic, Text Token-based Predictor Feature, Bag-of-Words Vector.

References

(Momtazi et al., 2010) ⇒ Saeedeh Momtazi, Sanjeev Khudanpur, and Dietrich Klakow. (2010). “A Comparative Study of Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval.” In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. ISBN:1-932432-65-5
- QUOTE: The window-wise co-occurrence statistic is an even narrower notion of context, considering only terms in a window surrounding [math]\displaystyle{ w' }[/math]. Specifically, a window of a fixed size is moved along the text, and [math]\displaystyle{ f_{ww'} }[/math] is set as the number of times both [math]\displaystyle{ w }[/math] and [math]\displaystyle{ w' }[/math] appear in the window. Since the window size is a free parameter, different sizes may be applied. In our experiments we use two window sizes, 2 and 5, that have been studied in related research (Church and Hanks, 1990).

(Koehn, 2008) ⇒ Philipp Koehn. (2008). “Statistical Machine Translation." Cambridge University Press. ISBN:0521874157
- … The task of determining the right word sense for a word in a given context is called word sense disambiguation. Research in this area has shown that the word context such as closely neighboring words and content words in a larger window are good indicators for word sense.

(Jacquemin, 2001) ⇒ Christian Jacquemin. (2001). “Spotting and Discovering Terms Through Natural Language Processing." MIT Press. ISBN:0262100851
- Text window: A text window is a sequence of [math]\displaystyle{ n }[/math] consecutive words in a document. For instance, in this sentence is a 3-words window in this sentence.

(Gale et al., 1992) ⇒ William A. Gale, Kenneth W. Church, and David Yarowsky (1992). “One Sense per Discourse.” In: Proceedings of the DARPA Speech and Natural Language Workshop.
- Our word-sense disambiguation algorithm uses the words in a 100-word context surrounding the polysemous word very much like the other two applications use the words in a test document. ... It is common to use very small contexts (e.g., 5-words) based on the observation that people do not need very much context in order to performance the disambiguation task. In contrast, we use much larger contexts (e.g., 100 words). Although people may be able to make do with much less context, we believe the machine needs all the help it can get, and we have found that the larger context makes the task much easier. In fact, we have been able to measure information at extremely large distances (10,000 words away from the polysemous word in question), though obviously most of the useful information appears relatively near the polysemous word (e.g., within the first 100 words or so). Needless to say, our 100-word contexts are considerably larger than the smaller 5-word windows that one normally finds in the literature.

(Lesk, 1986) ⇒ Michael E. Lesk. (1986). “Automatic Sense Disambiguation Uusing Machine Readable Dictionaries: How to tell a pine cone from a ice cream cone.” In: Proceedings of the Fifth International Conference on Systems Documentation, (SIGDOC 1986). doi:10.1145/318723.318728
- How wide a span of words should be counted? The program uses ten words as its default window; changing this to 4, 6 or 8 seems to make little difference. Should the span by syntactic (sensen or phrase rather than count of words)? Should the effect of a words on a decision be weighted inversely by its distance? I haven't coded such choices yet.