GM-RKB WikiText Error Correction (WTEC) System

Context:
- It can solve a GM-RKB WikiText Error Correction (WTEC) Task by implementing GM-RKB WikiText Error Correction (WTEC) Algorithm.
- It can range from being GM-RKB Seq2Seq WikiText Error Correction (WTEC) System to being a GM-RKB Character-Level MLE Language Model-based WikiText Error Correction (WTEC) System.
- It has been evaluated by a GM-RKB WikiText Error Correction (WTEC) Benchmark Task.
- …
Example(s):
- a Shallow GMRKB WikiText Error Correction System.
- …
Counter-Example(s):
- an HTML Error Correction System,
- a JSON Error Correction System,
- a Python Program Error Correction System;
- a WikiText Markup Parser;
- a Software Code Parser.
See: WikiFixer, Misspelling Correction System, Parsing System, GM-RKB Wikification System, Wiki Markup Language, Text Error Correction System, Natural Language Processing System, Seq2Seq Encoder-Decoder Neural Network, GM-RKB Seq2Seq Encoder-Decoder Neural Network, GM-RKB Character-Level MLE Language Model, Language Model, Character-Level MLE Language Model.

References

(Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted of conducting a series of experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.