n-Gram Tuple

Context:
- It can range from being a Unigram, to Bigram, to Trigram, ... based on its n-Gram Length.
- It can range from (typically) being a Contiguous n-Gram to being a Noncontiguous n-Gram.
- It can range from (typically) being an Unordered n-Gram Tuple to being an Ordered n-Gram Tuple.
- It can range from being a Text Window-based n-Gram to being a Sentence-based n-Gram to being a Document-based n-Gram.
- It can be a k-Skip n-Gram, such as a 0-Skip n-Gram or a 1-Skip n-Gram or a 2-Skip n-Gram.
- It can be the output of an n-Gram Generation System.
- It can be a member of an n-Gram Dataset (which might represent an n-Gram Model).
Example(s):
- a Text-Item n-Gram, such as:
  - a Word N-gram, that represents Adjacent Words in a string.
  - a Character N-gram.
Counter-Example(s):
- a Substring.
See: N-tuple, Co-occurrence Statistic, Base Pairs, Text Corpus.

References

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/n-gram Retrieved:2015-2-6.
- In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.
  An n-gram of size 1 is referred to as a "unigram"; size 2 is a “bigram” (or, less commonly, a "digram"); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., "four-gram", "five-gram", and so on.

http://lucene.apache.org/java/3_5_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
- QUOTE:A ShingleFilter constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token.
  For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".
  This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0.