N-Gram Language Model

From GM-RKB
(Redirected from 4-gram model)
Jump to navigation Jump to search

A N-Gram Language Model is a Language Model that estimates n-gram probabilities based on the Markov Assumption.



References

2019a

  • (Wikipedia, 2019) ⇒ https://www.wikiwand.com/en/Language_model#/n-gram Retrieved:2019-12-22.
    • In an n-gram model, the probability [math]\displaystyle{ P(w_1,\ldots,w_m) }[/math] of observing the sentence [math]\displaystyle{ w_1,\ldots,w_m }[/math] is approximated as

      [math]\displaystyle{ P(w_1,\ldots,w_m) = \prod^m_{i=1} P(w_i\mid w_1,\ldots,w_{i-1})\approx \prod^m_{i=1} P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1}) }[/math]

      It is assumed that the probability of observing the ith word wi in the context history of the preceding i − 1 words can be approximated by the probability of observing it in the shortened context history of the preceding n − 1 words (nth order Markov property).

      The conditional probability can be calculated from n-gram model frequency counts:

      [math]\displaystyle{ P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1}) = \frac{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1},w_i)}{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1})} }[/math]

      The terms bigram and trigram language models denote n-gram models with n = 2 and n = 3, respectively.[1]

      Typically, the n-gram model probabilities are not derived directly from frequency counts, because models derived this way have severe problems when confronted with any n-grams that have not been explicitly seen before. Instead, some form of smoothing is necessary, assigning some of the total probability mass to unseen words or n-grams. Various methods are used, from simple "add-one" smoothing (assign a count of 1 to unseen n-grams, as an uninformative prior) to more sophisticated models, such as Good-Turing discounting or back-off models.

  1. Craig Trim, What is Language Modeling?, April 26th, 2013.

2019b

2017

2015

1992