GM-RKB Seq2Seq WikiText Error Correction (WTEC) System

From GM-RKB

Jump to navigation Jump to search

A GM-RKB Seq2Seq WikiText Error Correction (WTEC) System is a GM-RKB WikiText Error Correction (WTEC) System that based on Sequence-to-Sequence (seq2seq) Neural Network Training System.

AKA: GM-RKB Seq2Seq NNet-based WikiFixer.
Context:
- It's performance has been evaluated by a GM-RKB WikiText Error Correction (WTEC) Benchmark Task.
- …
Example(s):
- GM-RKB Wikification System,
- …
Counter-Example(s):
See: WikiText Error Correction System, Misspelling Correction System, Parsing System, GM-RKB Wikification System, Wiki Markup Language, Text Error Correction System, Natural Language Processing System, Seq2Seq Encoder-Decoder Neural Network, GM-RKB Seq2Seq Encoder-Decoder Neural Network, GM-RKB Character-Level MLE Language Model, Language Model, Character-Level MLE Language Model.

References

2020

(Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020).
- QUOTE: The final baseline WikiFixer included in this task was based on the sequence-to-sequence (seq2seq) neural networks model which 'translates' partially noisy sequences to corrected ones. Our seq2seq model can map a fixed-length input sequence with a fixed length output. Input and output text lengths may differ. We were motivated by the approach proposed by (Chollampatt & Ng, 2018) for grammatical error correction, but at the character-level as described in (Weiss, 2016). The model consists of 3 main components: Encoder, Encoder Vector, and Decoder as illustrated in Fig. 2. Encoder is a stack of several recurrent layers where each accepts a single element of the input sequence, collects information for that element, and propagates it forward. In the WikiFixer problem, the input sequence is a collection of all characters from the noisy WikiText. The final hidden state produced from the encoder component is called Encoder Vector. This vector is constructed to represent the information for all input elements in order to help the decoder make accurate predictions. It acts as the initial hidden state of the decoder component of the model. Decoder is a stack of several recurrent layers where each predicts an output element at a time step. Each recurrent unit accepts a hidden state from the previous unit, produces an output as well as its own hidden state. In the WikiFixer problem, the output sequence is a collection of all characters of the fixed text. The recurrent units used are LSTMs. It's an extension for recurrent neural networks built to enhance their memory capacity.

**Figure 2:** A Seq2Seq NNet-based WikiFixer Architecture.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=GM-RKB_Seq2Seq_WikiText_Error_Correction_(WTEC)_System&oldid=784797"