GECToR Sequence Tagging System
Jump to navigation
Jump to search
A GECToR Sequence Tagging System is a GEC Sequence Tagging System that uses token-level transformations to edit tokens and correct grammatical errors in text items.
- AKA: GECToR.
- Context:
- Source code available at: https://github.com/grammarly/gector
- It was developed by Omelianchuk et al. (2020).
- It incorporates a pre-trained Transformer-encoder for performing error detection and tagging tasks.
- Example(s):
- the origial GECToR proposed in Omelianchuk et al. (2020) available at https://github.com/grammarly/gector.
- …
- Counter-Example(s):
- See: Sequence Tagging System, Grammatical Error Correction System, Transformer Network, Parallel Iterative Edit (PIE) Sequence Tagging GEC System, Transformer-based Seq2Seq GEC System.
References
2020
- (Omelianchuk et al., 2020) ⇒ Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem N. Chernodub, and Oleksandr Skurzhanskyi. (2020). “GECToR - Grammatical Error Correction: Tag, Not Rewrite.” In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA@ACL 2020).
- QUOTE: PIE (Awasthi et al., 2019) is an iterative sequence tagging GEC system that predicts token-level edit operations. While their approach is the most similar to ours, our work differs from theirs as described in our contributions below:
- 1. We develop custom g-transformations: token-level edits to perform (g)rammatical error corrections. Predicting g-transformations instead of regular tokens improves the generalization of our GEC sequence tagging system.
- 2. We decompose the fine-tuning stage into two stages: fine-tuning on errorful-only sentences and further fine-tuning on a small, high-quality dataset containing both errorful and error-free sentences.
- 3. We achieve superior performance by incorporating a pre-trained Transformer encoder in our GEC sequence tagging system. In our experiments, encoders from XLNet and RoBERTa outperform three other cutting-edge Transformer encoders (ALBERT, BERT, and GPT-2).