Sennrich-Haddow-Birch Rare Words Neural Machine Translation Task

From GM-RKB

Jump to navigation Jump to search

A Sennrich-Haddow-Birch Rare Words Neural Machine Translation Task is a Neural Machine Translation Task that translates rare and unseen words using BPE-based Word Segmentations.

AKA: Neural Machine Translation of Rare Words via BPE Word Segmentation.
Context:
- Task Input(s): text item containing rare and unseen words.
- Task Output(s): rare and unseen word translations, multi-language open-vocabularies.
- Task Requirement(s):
  - Benchmark Datasets:
  - Benchmark Performance Metrics:
    - BLEU Score;
    - CHRF3;
    - Unigram F1 Score.
  - Baseline Models:
    - WUnk - a word-level vocabulary model in which out-of-vocabulary words are represented as UNK;
    - WDict - a word-level vocabulary model with a back-off dictionary;
    - C2-50k - a character bigram vocabulary model with a short 50,000 unsegmented words;
    - BPE-60k - a BPE subword-level vocabulary model;
    - BPE-J90k -a joint BPE subword-level+ word-level vocabulary models.
- It evaluates the performance of the following Machine Translation Systems:
  - Sennrich-Haddow Syntax-based Statistical Machine Translation System (Sennrich & Haddow, 2015);
  - Edinburgh/JHU Phrase-based Machine Translation Systems (Haddow et al., 2015);
- It can be solved by Sennrich-Haddow-Birch Rare Words Neural Machine Translation System.
Example(s):
- English-to-German translation Sennrich et al. (2016) results:

			vocabulary		BLEU		CHRF3		unigram F1 (%)
name	segmentation	shortlist	source	target	single	ens-8	single	ens-8	all	rare	OOV
syntax-based (Sennrich and Haddow, 2015)					24.4	-	55.3	-	59.1	46.0	37.7
WUnk	-	-	300,000	500,000	20.6	22.8	47.2	48.9	56.7	20.4	0.0
WDict	-	-	300,000	500,000	22.0	24.2	50.5	52.4	58.1	36.8	36.8
C2-50k	char-bigram	50,000	60,000	60,000	22.8	25.3	51.9	53.5	58.4	40.5	30.9
BPE-60k	BPE	-	60,000	60,000	21.5	24.5	52.0	53.9	58.4	40.9	29.3
BPE-J90k	BPE (joint)	-	90,000	90,000	22.8	24.7	51.7	54.1	58.5	41.8	33.6

English-to-Russin translation Sennrich et al. (2016) results:

			vocabulary		BLEU		CHRF3		unigram F1 (%)
name	segmentation	shortlist	source	target	single	ens-8	single	ens-8	all	rare	OOV
phrase-based (Haddow etal.,2015)					24.3	-	53.8	-	56.0	31.3	16.5
WUnk	-	-	300,000	500,000	18.8	22.4	46.5	49.9	54.2	25.2	0.0
WDict	-	-	300,000	500,000	19.1	22.8	47.5	51.0	54.8	26.5	6.6
C2-50k	char-bigram	50,000	60,000	60,000	20.9	24.1	49.0	51.6	55.2	27.8	17.4
BPE-60k	BPE	-	60,000	60,000	20.5	23.6	49.8	52.7	55.3	29.7	15.6
BPE-J90k	BPE (joint)	-	90,000	100,000	20.4	24.1	49.7	53.0	55.8	29.7	18.3

References

2016

(Sennrich et al., 2016) ⇒ Rico Sennrich, Barry Haddow, and Alexandra Birch. (2016). “Neural Machine Translation of Rare Words with Subword Units". In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016).
- QUOTE: We investigate NMT models that operate on the level of subword units. Our main goal is to model open-vocabulary translation in the NMT network itself, without requiring a back-off model for rare words. In addition to making the translation process simpler, we also find that the subword models achieve better accuracy for the translation of rare words than large-vocabulary models and back-off dictionaries, and are able to productively generate new words that were not seen at training time. Our analysis shows that the neural networks are able to learn compounding and transliteration from subword representations.
  This paper has two main contributions:
  - We show that open-vocabulary neural machine translation is possible by encoding (rare) words via subword units. We find our architecture simpler and more effective than using large vocabularies and back-off dictionaries (Jean et al., 2015; Luong et al., 2015b).
  - We adapt byte pair encoding (BPE) (Gage, 1994), a compression algorithm, to the task of word segmentation. BPE allows for the representation of an open vocabulary through a fixed-size vocabulary of variable-length character sequences, making it a very suitable word segmentation strategy for neural network models.

2015a

(Haddow et al., 2015) ⇒ Barry Haddow, Matthias Huck, Alexandra Birch, Nikolay Bogoychev, and Philipp Koehn. (2015). “The Edinburgh/JHU Phrase-based Machine Translation Systems for WMT 2015". In: Proceedings of the Tenth Workshop on Statistical Machine Translation, WMT@EMNLP 2015. DOI:10.18653/v1/W15-3013.

2015b

(Sennrich & Haddow, 2015) ⇒ Rico Sennrich, and Barry Haddow. (2015). “A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation". In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015). DOI:10.18653/v1/D15-1248.

1994

(Gage, 1994) ⇒ Philip Gage (1994). "A New Algorithm for Data Compression". In: C User Journal, 12(2):23–38, February. DOI:10.5555/177910.177914.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Sennrich-Haddow-Birch_Rare_Words_Neural_Machine_Translation_Task&oldid=880743"