MLE Language Model-based WikiText Error Correction System
A MLE Language Model-based WikiText Error Correction System is a WikiText Error Correction System that is based on training a Maximum Likelihood Expectation (MLE)-based Language Model.
- AKA: MLE WikiFixer.
- Context:
- It can solve a MLE Language Model-based WikiText Error Correction Task by implementing MLE Language Model-based WikiText Error Correction Algorithms.
- Example(s):
- Counter-Example(s):
- See: Misspelling Correction System, Parsing System, Wikification System, Wiki Markup Language, Text Error Correction System, Natural Language Processing System, MLE Language Model, Character-Level MLE Language Model.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: WikiFixer is a tool that automatically repairs simple typos in WikiText based on patterns found in a related corpus. Currently, this system is focused on solving the GM-RKB WikiText Error Correction Task. The Character-level MLE Language Model-based WikiFixer is a data-driven implementation of this tool that is based on a simple function based on a character-level language model (in the form of a lookup table) that predicts the likelihood score for short sub-strings of characters. Currently, the function is trained on using the GM-RKB dataset. This statistical approach assumes that N-gram is a sequence of N characters. The trained language model can both predict the probability of certain N-gram appearing in the text or the probability of certain character appearing after certain N-gram (Norving,2007). The MLE WikiFixer uses number of similarity probabilities to detect noise and selects the error correction candidates with the highest probability. The model is controlled by setting thresholds for detecting noise and accepting an error correction action. This system is being used as a baseline for more sophisticated approaches such as the use of Neural Networks methods. Additional information of this baseline method can be found in an online document[1].
2019
- (Jurafsky & Martin, 2019) ⇒ Dan Jurafsky and James H. Martin (2019). "N-gram Language Models". In: Speech and Language Processing (3rd ed. draft)
- QUOTE: Thus, the general equation for this n-gram approximation to the conditional probability of the next word in a sequence is $P(w_n|w^{n-1}_1) \approx P(wn|w^{n-1}_{n-N+1} )$ (3.8)
Given the bigram assumption for the probability of an individual word, we can compute the probability of a complete word sequence by substituting Eq. 3.7 into Eq. 3.4:
$P(w^n_1 ) \approx \displaystyle_{k=1}^n P(w_k |w_{k−1})$ (3.9)How do we estimate these bigram or n-gram probabilities? An intuitive way to estimate probabilities is called maximum likelihood estimation or MLE. We get MLE estimate for the parameters of an n-gram model by getting counts from a corpus, and normalizing the counts so that they lie between 0 and 1[1].
For example, to compute a particular bigram probability of a word $y$ given a previous word $x$, we’ll compute the count of the bigram $C(xy)$ and normalize by the sum of all the bigrams that share the same first word $x$:
$P(w_n|w_{n-1}) = \dfrac{C(w_{n-1}w_n)}{C(w_{n-1})}$
- QUOTE: Thus, the general equation for this n-gram approximation to the conditional probability of the next word in a sequence is
- ↑ For probabilistic models, normalizing means dividing by some total count so that the resulting probabilities fall legally between 0 and 1.