Unsmoothed Maximum Likelihood-based Training Algorithm

References

(Raviv et al., 2016) ⇒ Hadas Raviv, Oren Kurland, and David Carmel. (2016). “Document Retrieval Using Entity-based Language Models.” In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 65-74 . ACM,
- QUOTE: ... … Following common practice [48], we use an unsmoothed maximum likelihood estimate for the query language model (Equation 2) and a Dirichlet smoothed document language model (Equation 3). We obtain four retrieval methods : HT3, HTOEnt, ST and STOEnt4, which utilize …

(Goldberg, 2015) ⇒ Yoav Goldberg. (2015). “The Unreasonable Effectiveness of Character-level Language Models (and Why RNNs Are Still Cool).” In: Blog Post.
- QUOTE: Mathematically, we would like to learn a function [math]\displaystyle{ P(c|h) }[/math]. Here, [math]\displaystyle{ c }[/math] is a character, [math]\displaystyle{ h }[/math] is a n-letters history, and [math]\displaystyle{ P(c|h) }[/math] stands for how likely is it to see [math]\displaystyle{ c }[/math] after we've seen [math]\displaystyle{ h }[/math].
  Perhaps the simplest approach would be to just count and divide (a.k.a maximum likelihood estimates). We will count the number of times each letter [math]\displaystyle{ c′ }[/math] appeared after [math]\displaystyle{ h }[/math], and divide by the total numbers of letters appearing after [math]\displaystyle{ h }[/math]. The unsmoothed part means that if we did not see a given letter following [math]\displaystyle{ h }[/math], we will just give it a probability of zero.