One Billion Word Language Modelling Benchmark Task
(Redirected from One Billion Word)
Jump to navigation
Jump to search
An One Billion Word Language Modelling Benchmark Task is a NLP Benchmark Task that evaluates the performance of language modeling systems.
- AKA: 1B Word Language Modelling Benchmark.
- Context:
- Benchmark Website: https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark
- Example(s):
- Model combination on the 1B Word Benchmark test set results:
Model | Perplexity |
---|---|
Interpolated KN 5-gram, 1.1B n-grams | 67.6 |
All models | 43.8 |
- Language models on 1B Word Benchmark test set results.
Model | Num. Params | Training Time | Perplexity | |
---|---|---|---|---|
[billions] | [hours] | [CPUs] | ||
Interpolated KN 5-gram, 1.1B n-grams (KN) | 1.76 | 3 | 100 | 67.6 |
Katz 5-gram, 1.1B n-grams | 1.74 | 2 | 100 | 79.9 |
Stupid Backoff 5-gram (SBO) | 1.13 | 0.4 | 200 | 87.9 |
Interpolated KN 5-gram, 15M n—grams | 0.03 | 3 | 100 | 243.2 |
Katz 5-gram, 15M n-grams | 0.03 | 2 | 100 | 127.5 |
Binary MaXEnt 5-gram (n-gram features) | 1.13 | 1 | 5000 | 115.4 |
Binary MaXEnt 5-gram (n-gram + skip-1 features) | 1.8 | 1.25 | 5000 | 107.1 |
Hierarchical Softmax MaXEnt 4-gram (HME) | 6 | 3 | 1 | 101.3 |
Recurrent NN-256 + MaXEnt 9-gram | 20 | 60 | 24 | 58.3 |
Recurrent NN-512 + MaXEnt 9-gram | 20 | 120 | 24 | 54.5 |
Recurrent NN-1024 + MaXEnt 9-gram | 20 | 240 | 24 | 51.3 |
- Counter-Example(s):
- See: Natural Language Processing System, Language Model, Machine Translation System, Text Corpus, Language Modeling Algorithm.
References
2014
- (Chelba et al., 2014) ⇒ Ciprian Chelba, Tomáš Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. (2014). “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling.” In: Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014).
- QUOTE: With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a this http://code.google.com/project; besides the scripts needed to rebuild the training / held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models.