Language Model (LM)-based Text Error Correction (TEC) Algorithm

Context:
- It can range from being a Character-level LM-based TEC Algorithm (using a character LM) to being a Work/Token-level Language Model-based TEC Algorithm (using a word/token LM).
- It can range from being a MLE Language Model-based TEC Algorithm (using an MLE LM) to being a Neural Language Model-based TEC Algorithm (using a Neural LM).
- It can be implemented by an LM-based TEC System (that can solve an LM-based TEC task).
Example(s):
- LM-based Grammatical Error Correction Algorithm, as proposed in (Bryant & Briscoe, 2018).
- …
Counter-Example(s):
- an Language Model-based Natural Language Generation Algorithm.
- a Neural seq2seq-based Text Error Correction Algorithm.
See: Neural-based TEC Algorithm.

References

(Bryant & Briscoe, 2018) ⇒ Christopher Bryant, and Ted Briscoe. (2018). “Language Model Based Grammatical Error Correction Without Annotated Training Data.” In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications.
- QUOTE: ... Since the end of the CoNLL-2014 shared task on grammatical error correction (GEC), research into language model (LM) based approaches to GEC has largely stagnated. In this paper, we re-examine LMs in GEC and show that it is entirely possible to build a simple system that not only requires minimal annotated data (~1000 sentences), but is also fairly competitive with several state-of-the-art systems. ...
  ... Despite coming a fairly competitive fourth in the shared task however (Lee and Lee, 2014), research into language model (LM) based approaches to GEC has largely stagnated. The main aim of this paper is hence to re-examine language modelling in the context of GEC and show that it is still possible to achieve competitive results even with very simple systems. In fact, a notable strength of LM-based approaches is that they rely on very little annotated data (purely for tuning purposes), and so it is entirely possible to build a reasonable correction system for any language given enough native text. In contrast, this is simply not possible with SMT and other popular approaches which always require (lots of) labelled data. …