WikiText Error Correction (WikiErrCorr) System
Jump to navigation
Jump to search
A WikiText Error Correction (WikiErrCorr) System is a marked-up text error correction system that implements a implementing a WTEC Algorithm to solve a WTEC task (to automatically detect and repair wikitext errors in WikiText documents).
- AKA: Automatic WikiText Error Detection and Correction System.
- Context:
- It can range from being a Shallow WikiErrCorr System (to solve a shallow WikiErrCorr task) to being a Deep WTEC System.
- It can be supported by a WikiText Error Detection Task.
- It can include:
- a Natural Language Text Error Correction System such as:
- a Grammatical Error Correction System for correcting grammar errors;
- a Spelling Error Correction System for correctiong ortographical errors;
- an HTML Error Correction System for correcting misspelled HTML codes.
- and WikiText Error Detection System for detecting mismatched WikiText Hyperlinking and Wiki Markup Language Errors.
- a Natural Language Text Error Correction System such as:
- It can be evaluated by a WikiText Error Correction (WTEC) Evaluation System on a WikiText Error Correction (WTEC) System Benchmark Task.
- …
- Example(s):
- WikiFixer, such as the one which supports a GM-RKB WikiText Error Correction (WTEC) System.
- …
- Counter-Example(s):
- See: Misspelling Correction System, Parsing System, Text Wikification System, Wiki Markup Language, Text Error Detection System, Natural Language Processing System.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted in conducting a series of experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.