One Billion Word Language Modelling Benchmark Corpus

From GM-RKB

Jump to navigation Jump to search

An One Billion Word Language Modelling Benchmark Corpus is a Text Corpus that was developed during the One Billion Word Language Modelling Benchmark Task.

AKA: 1-Billion Word Corpus.
Example(s):
- tf.data.Datasets:lm1b,
- …
Counter-Example(s):
See: GPT-2 Benchmark Task, Text Corpus, Language Model, Language Modeling Algorithm.

References

2014

(Chelba et al., 2014) ⇒ Ciprian Chelba, Tomáš Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. (2014). “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling.” In: Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014).
- QUOTE: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=One_Billion_Word_Language_Modelling_Benchmark_Corpus&oldid=785757"