Automated Linguistic Translation (MT) Task

An Automated Linguistic Translation (MT) Task is a language translation task that is also an automated natural language processing task.

Context:
- It can be performed by a Machine Translation System (that implements a machine translation algorithm).
- It can range from being a Heuristic Machine Translation Task to being a Data-Driven Machine Translation Task.
- It can be the subject of a Machine Translation Academic Discipline.
- ...
Example(s):
- a Machine Translation Benchmark Task.
- …
Counter-Example(s):
See: Language Translation Task, Natural Language Processing Task.

References

2020

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Machine_translation Retrieved:2020-3-1.
- Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation (MAHT) or interactive translation) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.
  On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies. ^[1] Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are proper names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports). The progress and potential of machine translation have been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality, first and most notably by Yehoshua Bar-Hillel. Some critics claim that there are in-principle obstacles to automating the translation process.

2018

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Machine_translation#Evaluation Retrieved:2018-8-27.
- There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.
  Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better. The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.
  In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection. ^[2] There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems. ^[3] Automated means of evaluation include BLEU, NIST, METEOR, and LEPOR. ^[4] Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human. ^[5] The late Claude Piron wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.^[6]
  In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases. The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.

↑ Albat, Thomas Fritz. “Systems and Methods for Automatically Estimating a Translation Time.” US Patent 0185235, 19 July 2012.
↑ Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in Translating and the computer 28. Proceedings of the twenty-eighth International Conference on translating and the computer, 16–17 November 2006, London, London: Aslib. .
↑ Anderson, D.D. (1995). Machine translation as a tool in second language learning. CALICO Journal. 13(1). 68–96.
↑ Han et al. (2012), "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors," in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, India.
↑ J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to techniques. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination.”
↑ See the annually performed NIST tests since 2001 and Bilingual Evaluation Understudy

[1] Albat, Thomas Fritz. “Systems and Methods for Automatically Estimating a Translation Time.” US Patent 0185235, 19 July 2012.

[2] Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in Translating and the computer 28. Proceedings of the twenty-eighth International Conference on translating and the computer, 16–17 November 2006, London, London: Aslib. .

[3] Anderson, D.D. (1995). Machine translation as a tool in second language learning. CALICO Journal. 13(1). 68–96.

[4] Han et al. (2012), "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors," in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, India.

[5] J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to techniques. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination.”

[NIST-6] See the annually performed NIST tests since 2001 and Bilingual Evaluation Understudy

[1]

[2]

[3]

[4]

[5]

[6]