WikiText Error Correction (WTEC) Task
Jump to navigation
Jump to search
A WikiText Error Correction (WTEC) Task is a Text Error Correction Task that automatically detects and repairs wikitext errors in WikiText documents.
- AKA: Automatic WikiText Error Detection and Correction Task.
- Context:
- Task Input: a noisy wikitext document (with wikitext errors).
- Task Output: a clean wikitext document (without wikitex errors).
- Task Requirement(s):
- a Language Model,
- a WikiText Model.
- It can be solved by a WTEC System that implements a WTEC Algorithm.
- Example(s):
- Counter-Example(s):
- See: Misspelling Correction Task, Parsing Task, Text Wikification Task, Wiki Markup Language, Natural Language Processing Task.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted in conducting a series of experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.