Text Error Correction (TEC) System
A Text Error Correction (TEC) System is a string error correction system that can solve a text error correction task (to automatically repair text errors in electronic text documents).
- Context: Automatic Text Detection and Correction System.
- It implements a Text Error Correction Algorithm to solve a Text Error Correction Task.
- It can be built using a TEC Platform, such as Grammarly.
- It can range from being a Statistical Text Error Correction (TEC) Modelling System to being a Dictionary-based Text Error Correction (TEC) System.
- It can range from being a Natural Language TEC System to being a Wiki TEC System.
- It can range from being a Character-Level TEC System, to being a Word/Token-level TEC System, to being a Context-Level TEC System.
- It can range from (typically) being a Data-Driven TEC System (such as a semi-supervised TEC system) to being a Heuristic TEC System.
- It can be supported by a Text Error Detection System.
- Example(s):
- a Natural Language TEC System such as:
- a Wiki TEC System such as: GM-RKB WikiFixer System,
- an MLE-based TEC System,
- a Sequence Modelling TEC System,
- an Automaton TEC System such as:
- a Neural Natural Language Modelling TEC System.
- …
- Counter-Example(s):
- an Automatic Text Retrieval System,
- an Automatic Word Recognition System,
- an Automatic Sequence Recognition System,
- an Automatic String Recognition System,
- an Error-Correcting Output Coding (ECOC) System,
- an OCR Error Correction System,
- a Speech Error Correction System,
- a Software Code Syntax Correction System,
- a DNA Error Correction System.
- See: Text Error, Text Editing, Parsing System, Text Wikification System, Error Detection System, Natural Language Processing System, Neural Language Modelling System, Sequence Modelling System, Sequence Tagging System, Text Generation System.
References
2019
- (Neill & Bollegala, 2019) ⇒ James O' Neill, and Danushka Bollegala. (2019). “Error-Correcting Neural Sequence Prediction.”
- QUOTE: This work proposed an error-correcting neural language model and a novel Latent Variable Mixture Sampling method for latent variable models. We find that performance is maintained compared to using the full conditional and related approximate methods, given a sufficient code-word size to account for correlations among classes. This corresponds to 40 bits for PTB and 100 bits for WikiText-2 and WikiText-103. Furthermore, we find that performance is improved when rank-ordering the codebook via embedding similarity where the query is the embedding of the most frequent word.
2018
- (Chollampatt & Ng, 2018) ⇒ Shamil Chollampatt, and Hwee Tou Ng. (2018). “A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction.” In: Proceedings of the Thirty-Second Conference on Artificial Intelligence (AAAI-2018).
- QUOTE: With the increasing number of non-native learners and writers of the English language around the globe, the necessity to improve authoring tools such as error correction systems is increasing. Grammatical error correction (GEC) is a well-established natural language processing (NLP) task that deals with building systems for automatically correcting errors in written text, particularly in non-native written text. The errors that a GEC system attempts to correct are not limited to grammatical errors, but also include spelling and collocation errors.
2013
- (Yang & Xiaobing, 2013) ⇒ Zhang Yang, and Zhao Xiaobing. (2013). “Automatic Error Detection and Correction of Text: The State of the Art.” In: Proceedings of the 2013 6th International Conference on Intelligent Networks and Intelligent Systems. ISBN:978-1-4799-2809-5 doi:10.1109/ICINIS.2013.77 ACM.
- QUOTE: Automatic error detection and correction of text is an important research area of natural language processing. Studies abroad originated in 1960s on automatic text error detection and correction. Since the 1990s, many academic studies on automatic text error detection and correction have been made in China. This paper analyzes the state of the art of the automatic check and correct approaches for English and Chinese text.
2008
- (Awadallah et al., 2008) ⇒ Ahmed Hassan Awadallah, Sara Noeman, and Hany Hassan. (2008). “Language Independent Text Correction Using Finite State Automata.” In: Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008).
- QUOTE: The proposed approach uses techniques from finite state theory to detect misspelled words and to generate a set of candidate corrections for each misspelled word. It also uses a language model to select the best correction from the set of candidate corrections using the context of the misspelled word. Using techniques from finite state theory, and avoiding calculating edit distances makes the approach very fast and efficient. The approach is completely language independent, and can be used with any language that has a dictionary and text data to building a language model.
2004
- (Reynaert, 2004) ⇒ Martin Reynaert. (2004). “Text Induced Spelling Correction.” In: Proceedings of the 20th International Conference on Computational Linguistics. doi:10.3115/1220355.1220475
- QUOTE: We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigrams and word bigrams. It is stored a novel representation based on a purpose-built hashing function, which provides a fast and computationally tractable way of checking whether a particular word form likely constitutes a spelling error and of retrieving correction candidates. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates when insufficient information for an unambiguous decision on a single correction is available. We describe the implemented prototype and evaluate it on English and Dutch text, containing real-world errors in more or less limited contexts. The results are compared with those of the isolated word spelling checking programs ISPELL and the Microsoft Proofing Tools (MPT).
2000
- (Zhang et al., 2000) ⇒ Lei Zhang, Changning Huang, Ming Zhou, and Haihua Pan. (2000). “Automatic Detecting/ correcting Errors in Chinese Text by An Approximate Word-matching Algorithm.” In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. doi:10.3115/1075218.1075250
- QUOTE: A fast approximate Chinese word-matching algorithm is presented, based on this algorithm a new automatic error detection and correction approach using confusing word substitution is implemented. Compared with the approach of (Chang, 94), its distinguished feature is that not only character substitution error, but also character insertion or deletion error and string substitution error could be handled.
- Chang, C. H. (1994). A pilot study on automatic chinese spelling error correction. Communication of COLIPS, 4(2), 143-149.
- QUOTE: A fast approximate Chinese word-matching algorithm is presented, based on this algorithm a new automatic error detection and correction approach using confusing word substitution is implemented. Compared with the approach of (Chang, 94), its distinguished feature is that not only character substitution error, but also character insertion or deletion error and string substitution error could be handled.
1999
- (Echanobe et al., 1999) ⇒ Javier Echanobe, Jose Ramon Garitagoitia, and Jose Ramon Gonzalez de Mendivil. (1999). “Deformed Fuzzy Automata for the Text Error Correction Problem.” In: Proceedings of EUSFLAT-ESTYLF Joint Conf. 1999.
- QUOTE: A fuzzy method for the text error correction problem is introduced. The method is able to handle insert, delete and substitution errors. Moreover, it uses the measurement level output that an Isolated Character Classifier can provide. The method is based on a Deformed System, in particular, a deformed fuzzy automaton is defined to model the possible errors in the words of the texts. Experimental results show good performance in correcting the three types of errors ...
The automatic detection and correction of errors is an important problem in the recognition of texts. Textual errors are mainly caused during the recognition process, and they are known as edition errors: insert, delete or change errors. In text recognition systems, the error correction is in part provided by a Contextual Postprocessing (CP). Let [math]\displaystyle{ w = a_1\;a_2 \cdots a_m }[/math] be an observed word which is obtained from a previous stage of the system; being the characters [math]\displaystyle{ a_i (1 \leq i \leq m) }[/math] belong to an alphabet [math]\displaystyle{ \Sigma }[/math]. The objective of the CP is to estimate a word [math]\displaystyle{ \hat{w} }[/math] in a set of words [math]\displaystyle{ D }[/math] (a dictionary) that is the best selection for [math]\displaystyle{ w }[/math], e.g., it minimizes a certain distance function [math]\displaystyle{ d(\hat{w}, w) }[/math] or maximizes the posteriori probability [math]\displaystyle{ P(\hat{w} | w) }[/math]. This problem is referred to as one of text error correction.
- QUOTE: A fuzzy method for the text error correction problem is introduced. The method is able to handle insert, delete and substitution errors. Moreover, it uses the measurement level output that an Isolated Character Classifier can provide. The method is based on a Deformed System, in particular, a deformed fuzzy automaton is defined to model the possible errors in the words of the texts. Experimental results show good performance in correcting the three types of errors
1992
- (Kukich, 1992) ⇒ Karen Kukich. (1992). "Techniques for Automatically Correcting Words in Text".; In: ACM Computing Surveys (CSUR) Journal, 24(4). doi:10.1145/146370.146380
- QUOTE: A distinction must be made between the tasks of error detection and error correction. Efficient techniques have been devised for detecting strings that do not appear in a given word list, dictionary, or lexicon [1]. But correcting a misspelled string is a much harder problem. Not only is the task of locating and ranking candidate words a challenge, but as Bentley (1985) points out: given the morphological productivity of the English language (e.g., almost any noun can be verbifled) and the rate at which words enter and leave the lexicon (e. g., catwomanhood, balkanization), some even question the wisdom of attempts at automatic correction.
Many existing spelling correctors exploit task-specific constraints. For example, interactive command line spelling correctors exploit the small size of a command language lexicon to achieve quick response times. Alternatively, longer response times are tolerated for noninteractive-mode manuscript preparation applications. Both of the foregoing spelling correction applications tolerate lower first-guess accuracy by returning multiple guesses and allowing the user to make the final choice of intended word. In contrast, some future applications, such as text-to-speech synthesis, will require a system to perform fully automatic, real-time word recognition and error correction for vocabularies of many thousands of words and names. The contrast between the first two examples and this last one highlights the distinction between interactive spelling checkers and automatic correction. The latter task is much more demanding, and it is not clear how far existing spelling correction techniques can go toward fully automatic word correction.
- QUOTE: A distinction must be made between the tasks of error detection and error correction. Efficient techniques have been devised for detecting strings that do not appear in a given word list, dictionary, or lexicon [1]. But correcting a misspelled string is a much harder problem. Not only is the task of locating and ranking candidate words a challenge, but as Bentley (1985) points out: given the morphological productivity of the English language (e.g., almost any noun can be verbifled) and the rate at which words enter and leave the lexicon (e. g., catwomanhood, balkanization), some even question the wisdom of attempts at automatic correction.
- ↑ The terms “word hst.” “dictionary,” and “lexicon” are used interchangeably in the literature. We prefer the use of the term lexicon because its connotation of “a list of words relevant to a particular subject, field, or class” seems best suited to spelling correction applications, but we adopt the terms “dictionary” and “word list” to describe research in which other authors have used them exclusively.