GM-RKB WikiText Error Correction (WTEC) Task
Jump to navigation
Jump to search
A GM-RKB WikiText Error Correction (WTEC) Task is a WikiText Error Correction (WTEC) Task for detecting and correcting GM-RKB WikiText Errors.
- AKA: GM-RKB WikiText Error Correction (WEC) Task.
- Context:
- Task Input: GM-RKB Page containing WikiText Errors.
- Task Output: a GM-RKB Page without WikiText Error.
- Task Requirement(s):
- It can be solved by a GM-RKB WikiText Error Correction (WTEC) System that implements a GM-RKB WikiText Error Correction (WTEC) Algorithm.
- It can range from being GM-RKB Seq2Seq WikiText Error Correction (WTEC) Task to being a GM-RKB Character-Level MLE Language Model-based WikiText Error Correction (WTEC) Task.
- …
- Example(s):
- Counter-Example(s):
- See: GM-RKB WikiText Error Correction (WTEC) Benchmark Task, Misspelling Correction System, Parsing System, GM-RKB Wikification System, Wiki Markup Language, Text Error Correction System, Natural Language Processing System, Seq2Seq Encoder-Decoder Neural Network, Language Model, Character-Level MLE Language Model.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira. (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of LREC 2020 (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted of conducting a series of experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.