Kyoto Free Translation Task (KFTT)
Jump to navigation
Jump to search
A Kyoto Free Translation Task (KFTT) is a Neural Machine Translation Benchmark Task that evaluates English-Japanese translation performance.
- Example(s):
- KFTT results:
- KFTT results:
Articles | Sentences | Japanese Words | English Words | |
Train | 14126 | 440k | 12.0M | 11.5M |
Train (clean) | 14126 | 330k | 6.09M | 5.91M |
Tune | 15 | 1235 | 34.4k | 30.8k |
Dev | 15 | 1166 | 26.8k | 24.3k |
Test | 15 | 1160 | 28.5k | 26.7k |
- Counter-Example(s):
- See: Neural Machine Translation Task, Subword Segmentation, Subword Unit, BLEU Score, Byte Pair Encoding (BPE), Subword Neural Machine Translation.
References
2018
- (Kudo & Richardson, 2018) ⇒ Taku Kudo, and John Richardson. (2018). “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing". In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) System Demonstrations. DOI:10.18653/v1/d18-2012.
- QUOTE: We validated the performance of the different preprocessing on English-Japanese translation of Wikipedia articles, as specified by the Kyoto Free Translation Task (KFTT)[1]. The training, development and test data of KFTT consist of 440k, 1166 and 1160 sentences respectively.
2011
- (Neubig, 2011) ⇒ Graham Neubig (211). "The Kyoto Free Translation Task".
- QUOTE: The Kyoto Free Translation Task is a task for Japanese-English translation that focuses on Wikipedia articles related to Kyoto. The data used was originally prepared by the National Institute for Information and Communication Technology (NICT) and released as the Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles (we are simply using the data, NICT does not specifically endorse or sponsor this task).