GM-RKB Character-Level MLE Language Model-based WikiText Error Correction (WTEC) System
(Redirected from GM-RKB Character-level MLE Language Model-based WikiFixer)
Jump to navigation
Jump to search
A GM-RKB Character-Level MLE Language Model-based WikiText Error Correction (WTEC) System is a GM-RKB WikiText Error Correction (WTEC) System that is based on a Maximum Likelihood-based Character-Level Language Model (LM) Training System.
- Example(s):
- …
- Counter-Example(s):
- See: WikiText Error Correction System, Misspelling Correction System, Parsing System, GM-RKB Wikification System, Wiki Markup Language, Text Error Correction System, Natural Language Processing System, Seq2Seq Encoder-Decoder Neural Network, GM-RKB Seq2Seq Encoder-Decoder Neural Network, GM-RKB Character-Level MLE Language Model, Language Model, Character-Level MLE Language Model.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: WikiFixer is a tool that automatically repairs simple typos in WikiText based on patterns found in a related corpus. Currently, this system is focused on solving the GM-RKB WikiText Error Correction Task. The Character-level MLE Language Model-based WikiFixer is a data-driven implementation of this tool that is based on a simple function based on a character-level language model (in the form of a lookup table) that predicts the likelihood score for short sub-strings of characters. Currently, the function is trained on using the GM-RKB dataset. This statistical approach assumes that N-gram is a sequence of N characters. The trained language model can both predict the probability of certain N-gram appearing in the text or the probability of certain character appearing after certain N-gram (Norving,2007). The MLE WikiFixer uses number of similarity probabilities to detect noise and selects the error correction candidates with the highest probability. The model is controlled by setting thresholds for detecting noise and accepting an error correction action. This system is being used as a baseline for more sophisticated approaches such as the use of Neural Networks methods. Additional information of this baseline method can be found in an online document[1].