N-gram Index

From GM-RKB
Jump to navigation Jump to search

An n-gram Index is an Index of the N-grams in a Corpus.



References

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/N-gram
    • N-gram models are a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.
    • An n-gram is a sub-sequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.
    • An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram". Some language models built from n-grams are "(n − 1)-order Markov models".