GM-RKB WikiText Error Correction (WTEC) Benchmark Task

A GM-RKB WikiText Error Correction (WTEC) Benchmark Task is a WikiText Error Correction (WTEC) System Benchmark Task that evaluates the performance of the GM-RKB WikiText Error Correction (WTEC) Systems.

Context:
- Task Input: GM-RKB Benchmark Datasets (GM-RKB Datasets, and Wikipedia Datasets).
- Task Output: GM-RKB WTEC System Evaluation Score.
- Task Requirement(s):
  - Baseline Models: Character-level MLE Language Model-based WikiFixer, and Seq2Seq NNet-based WikiFixer.
  - a GM-RKB Data Pre-Processing and Noise Generation System that adds human-like editing errors to WikiText,
  - a GM-RKB WTEC Model Training System.
  - a GM-RKB WikiText Error Correction (WTEC) Benchmark Precision-based Performance Metric :
    $Evaluation\,Score = 1 \times TPcount − 5 \times FPcount$
    where $TPcount$ is the GM-RKB WTEC System's True Positives and $FPcount$ is the GM-RKB WTEC System's False Negatives count, respectively
Example(s):
- GM-RKB WTEC System Benchmark Task output (GM-RKB WTEC Models training on GM-RKB Datasets):

Model	TP	FP	Score
JamSpell	18,324	460,916	-2,286,256
Pyenchant	18,630	4,717,170	-23,567,220
WikiFixer MLE	9,838	449	7,593
WikiFixer NNet GM-RKB	16,061	696	12,581
WikiFixer NNet Wikipedia	8,678	524	6,058
Wikifixer NNet Wikipedia pretrained + GM-RKB	13,841	490	11,391
Wikifixer NNet Wikipedia 7,000 pages+GM-RKB	16,003	652	12,743

GM-RKB WTEC System Benchmark Task output (GM-RKB WTEC Models training on Wikipedia Datasets):

Model	TP	FP	Score
JamSpell	11,479	312,809	-1,552,566
Pyenchant	9,656	8,351,825	-41,749,469
WikiFixer MLE	252	166	-578
WikiFixer NNet GM-RKB	3,954	287	2,519
WikiFixer NNet Wikipedia	6,385	2,11	5,330
Wikifixer NNet Wikipedia pretrained + GM-RKB	3,284	160	2,484
Wikifixer NNet Wikipedia 7,000 pages+GM-RKB	6,056	277	4,671

References

2020

(Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: We designed and implemented the GM-RKB WikiText Error Correction (WEC) Task to benchmark systems that attempt to automatically recognize and repair simple typographical errors in WikiText based on frequent patterns observed in the corpus. The task consisted in conducting a series of experiments on benchmark datasets to find the best performing WEC system. We adopted a precision-based performance metric because we were interested in measuring of the balance between the welcome benefit a WEC system succeeding in repairing an error correctly against the significant cost of it introducing an error which requires to be repaired manually. We compared the relative performance of a character MLE Language Model-based and a sequence-to-sequence (seq2seq) neural network-based WEC, as well as two spelling error correction systems trained on GM-RKB and Wikipedia corpora datasets. Because of the difficulty in logging real wikitext errors introduced by human editors, we developed a sub-system that artificially can add human-like editing errors to the original text and convert it to training data.
  (...)
  
  The GM-RKB WTEC Benchmark Task can be divided in 3 main sub-tasks: (1) the creation and preparation of the training dataset; (2) the training of WEC models on the datasets; and (3) the analysis of the relative performance of WEC systems. Fig.1 shows a schematic view of our benchmarking process.

**Figure 1:** GM-RKB WikiText Error Correction System Benchmark Task

GM-RKB WikiText Error Correction (WTEC) Benchmark Task

References

2020

Navigation menu

Search