Unsmoothed Maximum Likelihood-based Training Algorithm
(Redirected from unsmoothed MLE algorithm)
Jump to navigation
Jump to search
An Unsmoothed Maximum Likelihood-based Training Algorithm is a maximum likelihood-based training algorithm that ...
- …
- Counter-Example(s):
- See: Dirichlet Smoothed Document Language Model, Unsmoothed Maximum-Likelihood Character-level LM.
References
2016
- (Raviv et al., 2016) ⇒ Hadas Raviv, Oren Kurland, and David Carmel. (2016). “Document Retrieval Using Entity-based Language Models.” In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 65-74 . ACM,
- QUOTE: ... … Following common practice [48], we use an unsmoothed maximum likelihood estimate for the query language model (Equation 2) and a Dirichlet smoothed document language model (Equation 3). We obtain four retrieval methods : HT3, HTOEnt, ST and STOEnt4, which utilize …
2015
- (Goldberg, 2015) ⇒ Yoav Goldberg. (2015). “The Unreasonable Effectiveness of Character-level Language Models (and Why RNNs Are Still Cool).” In: Blog Post.
- QUOTE: Mathematically, we would like to learn a function [math]\displaystyle{ P(c|h) }[/math]. Here, [math]\displaystyle{ c }[/math] is a character, [math]\displaystyle{ h }[/math] is a n-letters history, and [math]\displaystyle{ P(c|h) }[/math] stands for how likely is it to see [math]\displaystyle{ c }[/math] after we've seen [math]\displaystyle{ h }[/math].
Perhaps the simplest approach would be to just count and divide (a.k.a maximum likelihood estimates). We will count the number of times each letter [math]\displaystyle{ c′ }[/math] appeared after [math]\displaystyle{ h }[/math], and divide by the total numbers of letters appearing after [math]\displaystyle{ h }[/math]. The unsmoothed part means that if we did not see a given letter following [math]\displaystyle{ h }[/math], we will just give it a probability of zero.
- QUOTE: Mathematically, we would like to learn a function [math]\displaystyle{ P(c|h) }[/math]. Here, [math]\displaystyle{ c }[/math] is a character, [math]\displaystyle{ h }[/math] is a n-letters history, and [math]\displaystyle{ P(c|h) }[/math] stands for how likely is it to see [math]\displaystyle{ c }[/math] after we've seen [math]\displaystyle{ h }[/math].