N-gram Index
Jump to navigation
Jump to search
An n-gram Index is an Index of the N-grams in a Corpus.
- Context:
- It can be
- a Word N-gram Index, e.g. to aid a Text Classification Algorithm.
- a Character N-gram Index, to aid Performance of a String Edit Distance Function.
- It can be
- Example(s):
- The words that have the Character N-gram (Bigram). of "qu".
- See: Language Model.
References
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/N-gram
- N-gram models are a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.
- An n-gram is a sub-sequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.
- An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram". Some language models built from n-grams are "(n − 1)-order Markov models".