Text Error Correction (TEC) System

A Text Error Correction (TEC) System is a string error correction system that can solve a text error correction task (to automatically repair text errors in electronic text documents).

Context: Automatic Text Detection and Correction System.
- It implements a Text Error Correction Algorithm to solve a Text Error Correction Task.
- It can be built using a TEC Platform, such as Grammarly.
- It can range from being a Statistical Text Error Correction (TEC) Modelling System to being a Dictionary-based Text Error Correction (TEC) System.
- It can range from being a Natural Language TEC System to being a Wiki TEC System.
- It can range from being a Character-Level TEC System, to being a Word/Token-level TEC System, to being a Context-Level TEC System.
- It can range from (typically) being a Data-Driven TEC System (such as a semi-supervised TEC system) to being a Heuristic TEC System.
- It can be supported by a Text Error Detection System.
Example(s):
- a Natural Language TEC System such as:
- a Wiki TEC System such as: GM-RKB WikiFixer System,
- an MLE-based TEC System,
- a Sequence Modelling TEC System,
- an Automaton TEC System such as:
  - a Deformed Fuzzy Automaton TEC System (Kukich, 1992),
  - a Finite State Automanton TEC System,
- a Neural Natural Language Modelling TEC System.
- …
Counter-Example(s):
See: Text Error, Text Editing, Parsing System, Text Wikification System, Error Detection System, Natural Language Processing System, Neural Language Modelling System, Sequence Modelling System, Sequence Tagging System, Text Generation System.

References

1999

(Echanobe et al., 1999) ⇒ Javier Echanobe, Jose Ramon Garitagoitia, and Jose Ramon Gonzalez de Mendivil. (1999). “Deformed Fuzzy Automata for the Text Error Correction Problem.” In: Proceedings of EUSFLAT-ESTYLF Joint Conf. 1999.
- QUOTE: A fuzzy method for the text error correction problem is introduced. The method is able to handle insert, delete and substitution errors. Moreover, it uses the measurement level output that an Isolated Character Classifier can provide. The method is based on a Deformed System, in particular, a deformed fuzzy automaton is defined to model the possible errors in the words of the texts. Experimental results show good performance in correcting the three types of errors
  ...
  The automatic detection and correction of errors is an important problem in the recognition of texts. Textual errors are mainly caused during the recognition process, and they are known as edition errors: insert, delete or change errors. In text recognition systems, the error correction is in part provided by a Contextual Postprocessing (CP). Let [math]\displaystyle{ w = a_1\;a_2 \cdots a_m }[/math] be an observed word which is obtained from a previous stage of the system; being the characters [math]\displaystyle{ a_i (1 \leq i \leq m) }[/math] belong to an alphabet [math]\displaystyle{ \Sigma }[/math]. The objective of the CP is to estimate a word [math]\displaystyle{ \hat{w} }[/math] in a set of words [math]\displaystyle{ D }[/math] (a dictionary) that is the best selection for [math]\displaystyle{ w }[/math], e.g., it minimizes a certain distance function [math]\displaystyle{ d(\hat{w}, w) }[/math] or maximizes the posteriori probability [math]\displaystyle{ P(\hat{w} | w) }[/math]. This problem is referred to as one of text error correction.

1992

(Kukich, 1992) ⇒ Karen Kukich. (1992). "Techniques for Automatically Correcting Words in Text".; In: ACM Computing Surveys (CSUR) Journal, 24(4). doi:10.1145/146370.146380
- QUOTE: A distinction must be made between the tasks of error detection and error correction. Efficient techniques have been devised for detecting strings that do not appear in a given word list, dictionary, or lexicon ^[1]. But correcting a misspelled string is a much harder problem. Not only is the task of locating and ranking candidate words a challenge, but as Bentley (1985) points out: given the morphological productivity of the English language (e.g., almost any noun can be verbifled) and the rate at which words enter and leave the lexicon (e. g., catwomanhood, balkanization), some even question the wisdom of attempts at automatic correction.
  Many existing spelling correctors exploit task-specific constraints. For example, interactive command line spelling correctors exploit the small size of a command language lexicon to achieve quick response times. Alternatively, longer response times are tolerated for noninteractive-mode manuscript preparation applications. Both of the foregoing spelling correction applications tolerate lower first-guess accuracy by returning multiple guesses and allowing the user to make the final choice of intended word. In contrast, some future applications, such as text-to-speech synthesis, will require a system to perform fully automatic, real-time word recognition and error correction for vocabularies of many thousands of words and names. The contrast between the first two examples and this last one highlights the distinction between interactive spelling checkers and automatic correction. The latter task is much more demanding, and it is not clear how far existing spelling correction techniques can go toward fully automatic word correction.

↑ The terms “word hst.” “dictionary,” and “lexicon” are used interchangeably in the literature. We prefer the use of the term lexicon because its connotation of “a list of words relevant to a particular subject, field, or class” seems best suited to spelling correction applications, but we adopt the terms “dictionary” and “word list” to describe research in which other authors have used them exclusively.

[1] The terms “word hst.” “dictionary,” and “lexicon” are used interchangeably in the literature. We prefer the use of the term lexicon because its connotation of “a list of words relevant to a particular subject, field, or class” seems best suited to spelling correction applications, but we adopt the terms “dictionary” and “word list” to describe research in which other authors have used them exclusively.

[1]

Text Error Correction (TEC) System

References

2019

2018

2013

2008

2004

2000

1999

1992

Navigation menu

Search