Unsmoothed Maximum-Likelihood Character-Level n-Gram Language Model
An Unsmoothed Maximum-Likelihood Character-Level n-Gram Language Model is a character-level LM that is an unsmoothed LM.
- Context:
- It can range from being a Forward Unsmoothed Character-Level Language Model to being a Backward Unsmoothed Character-Level Language Model to being a Bi-Directional Unsmoothed Character-Level Language Model.
- It can be produced by an Unsmoothed Character-Level Language Model Training System (that implements an unsmoothed character-level MLE-based LM training algorithm to solve by an unsmoothed MLE-based character-level LM task).
- It can range from being a Unigram Unsmoothed Character-Level Language Model, to being a Bigram Unsmoothed Character-Level Language Model, to being a Trigram Unsmoothed Character-Level Language Model to being an n-Gram Unsmoothed Character-Level Language Model.
- Example(s):
- …
- Counter-Example(s):
- See: Text Character.
References
2015b
- (Goldberg, 2015) ⇒ Yoav Goldberg. (2015). “The Unreasonable Effectiveness of Character-level Language Models (and Why RNNs Are Still Cool).” In: Blog Post.
- QUOTE: ... However, it feels to me that most readers of the post are impressed by the wrong reasons. This is because they are not familiar with unsmoothed maximum-likelihood character level language models and their unreasonable effectiveness at generating rather convincing natural language outputs. In what follows I will briefly describe these character-level maximum-likelihood language models, which are much less magical than RNNs and LSTMs, and show that they too can produce a rather convincing Shakespearean prose. ...
... Mathematically, we would like to learn a function [math]\displaystyle{ P(c|h) }[/math]. Here, [math]\displaystyle{ c }[/math] is a character, [math]\displaystyle{ h }[/math] is a n-letters history, and [math]\displaystyle{ P(c|h) }[/math] stands for how likely is it to see [math]\displaystyle{ c }[/math] after we've seen [math]\displaystyle{ h }[/math]. Perhaps the simplest approach would be to just count and divide (a.k.a maximum likelihood estimates). We will count the number of times each letter [math]\displaystyle{ c′ }[/math] appeared after [math]\displaystyle{ h }[/math], and divide by the total numbers of letters appearing after [math]\displaystyle{ h }[/math]. The unsmoothed part means that if we did not see a given letter following [math]\displaystyle{ h }[/math], we will just give it a probability of zero. ...
- QUOTE: ... However, it feels to me that most readers of the post are impressed by the wrong reasons. This is because they are not familiar with unsmoothed maximum-likelihood character level language models and their unreasonable effectiveness at generating rather convincing natural language outputs. In what follows I will briefly describe these character-level maximum-likelihood language models, which are much less magical than RNNs and LSTMs, and show that they too can produce a rather convincing Shakespearean prose. ...