Unsmoothed Maximum-Likelihood Character-Level Language Modeling Algorithm
An Unsmoothed Maximum-Likelihood Character-Level Language Modeling Algorithm is a maximum-likelihood character-level language modeling algorithm that is a unsmoothed maximum-likelihood language modeling algorithm.
- Context:
- It can be implemented by an Unsmoothed Maximum-Likelihood Character-level Language Model System (to produce an unsmoothed MLE-based character-level LM).
- Example(s):
train_char_lm()
as in (Goldberg, 2015).- …
- Counter-Example(s):
- See: Python-based Character-level Language Modeling System.
References
2015
- (Goldberg, 2015) ⇒ Yoav Goldberg. (2015). “The Unreasonable Effectiveness of Character-level Language Models (and Why RNNs Are Still Cool).” In: Blog Post.
- QUOTE: This is because they are not familiar with unsmoothed maximum-likelihood character level language models and their unreasonable effectiveness at generating rather convincing natural language outputs. ...
RNNs and LSTMs can potentially learn infinite-order language model (they guess the next character based on a "state" which supposedly encode all the previous history). We here will restrict ourselves to a fixed-order language model. So, we are seeing [math]\displaystyle{ n }[/math]letters, and need to guess the n+1th one. We are also given a large-ish amount of text (say, all of Shakespeare works) that we can use. How would we go about solving this task? Mathematically, we would like to learn a function [math]\displaystyle{ P(c|h) }[/math]. Here, [math]\displaystyle{ c }[/math] is a character, [math]\displaystyle{ h }[/math] is a n-letters history, and [math]\displaystyle{ P(c|h) }[/math] stands for how likely is it to see [math]\displaystyle{ c }[/math] after we've seen [math]\displaystyle{ h }[/math]. Perhaps the simplest approach would be to just count and divide (a.k.a maximum likelihood estimates). We will count the number of times each letter [math]\displaystyle{ c′ }[/math] appeared after h, and divide by the total numbers of letters appearing after h. The unsmoothed part means that if we did not see a given letter following h, we will just give it a probability of zero (...)
Here is the code for training the model.
fname
is a file to read the characters from order is the history size to consult. Note that we pad the data with leading~
so that we also learn how to start.
- QUOTE: This is because they are not familiar with unsmoothed maximum-likelihood character level language models and their unreasonable effectiveness at generating rather convincing natural language outputs. ...