Sennrich-Haddow-Birch Rare Words Neural Machine Translation Task
(Redirected from Neural Machine Translation of Rare Words via BPE Word Segmentation)
Jump to navigation
Jump to search
A Sennrich-Haddow-Birch Rare Words Neural Machine Translation Task is a Neural Machine Translation Task that translates rare and unseen words using BPE-based Word Segmentations.
- AKA: Neural Machine Translation of Rare Words via BPE Word Segmentation.
- Context:
- Task Input(s): text item containing rare and unseen words.
- Task Output(s): rare and unseen word translations, multi-language open-vocabularies.
- Task Requirement(s):
- Benchmark Datasets:
- Benchmark Performance Metrics:
- Baseline Models:
- WUnk - a word-level vocabulary model in which out-of-vocabulary words are represented as UNK;
- WDict - a word-level vocabulary model with a back-off dictionary;
- C2-50k - a character bigram vocabulary model with a short 50,000 unsegmented words;
- BPE-60k - a BPE subword-level vocabulary model;
- BPE-J90k -a joint BPE subword-level+ word-level vocabulary models.
- It evaluates the performance of the following Machine Translation Systems:
- It can be solved by Sennrich-Haddow-Birch Rare Words Neural Machine Translation System.
- Example(s):
- English-to-German translation Sennrich et al. (2016) results:
vocabulary | BLEU | CHRF3 | unigram F1 (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
name | segmentation | shortlist | source | target | single | ens-8 | single | ens-8 | all | rare | OOV |
syntax-based (Sennrich and Haddow, 2015) | 24.4 | - | 55.3 | - | 59.1 | 46.0 | 37.7 | ||||
WUnk | - | - | 300,000 | 500,000 | 20.6 | 22.8 | 47.2 | 48.9 | 56.7 | 20.4 | 0.0 |
WDict | - | - | 300,000 | 500,000 | 22.0 | 24.2 | 50.5 | 52.4 | 58.1 | 36.8 | 36.8 |
C2-50k | char-bigram | 50,000 | 60,000 | 60,000 | 22.8 | 25.3 | 51.9 | 53.5 | 58.4 | 40.5 | 30.9 |
BPE-60k | BPE | - | 60,000 | 60,000 | 21.5 | 24.5 | 52.0 | 53.9 | 58.4 | 40.9 | 29.3 |
BPE-J90k | BPE (joint) | - | 90,000 | 90,000 | 22.8 | 24.7 | 51.7 | 54.1 | 58.5 | 41.8 | 33.6 |
- English-to-Russin translation Sennrich et al. (2016) results:
vocabulary | BLEU | CHRF3 | unigram F1 (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
name | segmentation | shortlist | source | target | single | ens-8 | single | ens-8 | all | rare | OOV |
phrase-based (Haddow etal.,2015) | 24.3 | - | 53.8 | - | 56.0 | 31.3 | 16.5 | ||||
WUnk | - | - | 300,000 | 500,000 | 18.8 | 22.4 | 46.5 | 49.9 | 54.2 | 25.2 | 0.0 |
WDict | - | - | 300,000 | 500,000 | 19.1 | 22.8 | 47.5 | 51.0 | 54.8 | 26.5 | 6.6 |
C2-50k | char-bigram | 50,000 | 60,000 | 60,000 | 20.9 | 24.1 | 49.0 | 51.6 | 55.2 | 27.8 | 17.4 |
BPE-60k | BPE | - | 60,000 | 60,000 | 20.5 | 23.6 | 49.8 | 52.7 | 55.3 | 29.7 | 15.6 |
BPE-J90k | BPE (joint) | - | 90,000 | 100,000 | 20.4 | 24.1 | 49.7 | 53.0 | 55.8 | 29.7 | 18.3 |
- Counter-Example(s):
- CJS Neural Narrative Text Generation Task,
- Lee-Krahmer-Wubben Data-To-Text Generation Task,
- LSPGS Wikipedia Long Sentences Summarization Task,
- See-Liu-Manning Text Summarization Task.
- GSGAN Benchmark Task,
- LeakGAN Benchmark Task.
- MaliGAN Benchmark Task,
- MaskGAN Benchmark Task,
- RankGAN Benchmark Task,
- SeqGAN Benchmark Task,
- TextGAN Benchmark Task,
- Texygen Benchmark Task.
- See: Machine Translation Task, Natural Language Processing Task, Natural Language Generation Task, Text Segmentation Task, Byte-Pair Encoding (BPE) Task.
References
2016
- (Sennrich et al., 2016) ⇒ Rico Sennrich, Barry Haddow, and Alexandra Birch. (2016). “Neural Machine Translation of Rare Words with Subword Units". In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016).
- QUOTE: We investigate NMT models that operate on the level of subword units. Our main goal is to model open-vocabulary translation in the NMT network itself, without requiring a back-off model for rare words. In addition to making the translation process simpler, we also find that the subword models achieve better accuracy for the translation of rare words than large-vocabulary models and back-off dictionaries, and are able to productively generate new words that were not seen at training time. Our analysis shows that the neural networks are able to learn compounding and transliteration from subword representations.
This paper has two main contributions:
- We show that open-vocabulary neural machine translation is possible by encoding (rare) words via subword units. We find our architecture simpler and more effective than using large vocabularies and back-off dictionaries (Jean et al., 2015; Luong et al., 2015b).
- We adapt byte pair encoding (BPE) (Gage, 1994), a compression algorithm, to the task of word segmentation. BPE allows for the representation of an open vocabulary through a fixed-size vocabulary of variable-length character sequences, making it a very suitable word segmentation strategy for neural network models.
- QUOTE: We investigate NMT models that operate on the level of subword units. Our main goal is to model open-vocabulary translation in the NMT network itself, without requiring a back-off model for rare words. In addition to making the translation process simpler, we also find that the subword models achieve better accuracy for the translation of rare words than large-vocabulary models and back-off dictionaries, and are able to productively generate new words that were not seen at training time. Our analysis shows that the neural networks are able to learn compounding and transliteration from subword representations.
2015a
- (Haddow et al., 2015) ⇒ Barry Haddow, Matthias Huck, Alexandra Birch, Nikolay Bogoychev, and Philipp Koehn. (2015). “The Edinburgh/JHU Phrase-based Machine Translation Systems for WMT 2015". In: Proceedings of the Tenth Workshop on Statistical Machine Translation, WMT@EMNLP 2015. DOI:10.18653/v1/W15-3013.
2015b
- (Sennrich & Haddow, 2015) ⇒ Rico Sennrich, and Barry Haddow. (2015). “A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation". In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015). DOI:10.18653/v1/D15-1248.
1994
- (Gage, 1994) ⇒ Philip Gage (1994). "A New Algorithm for Data Compression". In: C User Journal, 12(2):23–38, February. DOI:10.5555/177910.177914.