WikiFixer System
(Redirected from Shallow GMRKB WikiText Error Correction System)
Jump to navigation
Jump to search
A WikiFixer System is a shallow WikiText error correction system that implements a WikiFixer algorithm to solve a WikiFixer task.
- Context:
- It can (typically) have a Basic Usage: of
from WikiFixer import WikiFixer
fixer = WikiFixer()
fixer.load_models(["models/model5-0.json"])
fixer.fix_text("this is a test text")
- …
- It can (typically) have a Basic Usage: of
- Example(s):
- the one published at https://github.com/GM-RKB/LREC-2020.
- the one that supports GM-RKB's WikiText Error Correction System.
- an MLE Language Model-based WikiFixer.
- a Seq2Seq NNet-based WikiFixer.
- …
- Counter-Example(s):
- …
- See: WikiGenerator.
References
2020
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted in conducting a series experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit of a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.