2011 EmpiricalEvaluationandCombinati
- (Mikolov et al., 2011) ⇒ Tomáš Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, and Jan Černocký. (2011). “Empirical Evaluation and Combination of Advanced Language Modeling Techniques..” In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011).
Subject Headings: Neural Network Language Model, Structured Language Model, Class-based Language Model, Cache-based Language Model, Maximum Entropy Language Model, Random Forest Language Model, Good-Turing Trigram language Model, Kneser-Ney Smoothed 5-Gram Language Model.
Notes
- Other Version(s):
Cited By
- Google Scholar: ~ 345 Citations.
Quotes
Abstract
We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state-of-the-art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%.
1. Introduction
In this paper, we will deal with the statistical approaches to language modeling, that are motivated by information theory. This will allow us to fairly compare different techniques. It is supposed that the model that is the best predictor of words given the context, is the closest model to the true model of language. Thus, the measure that we will aim to minimize is the cross entropy of the test data given the language model. The cross entropy is equal to [math]\displaystyle{ \log_2 }[/math] perplexity (PPL). The per-word perplexity is defined as
- [math]\displaystyle{ PPL = \sqrt[K]{ \prod^K_{i=1} \frac{1}{P(w_i \mid w_{1,...,i-1})} } }[/math]
It is important to note that perplexity does not depend just on the quality of the model, but also on the nature of training and test data. For difficult tasks, when small amounts of training data are available and large vocabulary is used (thus the model has to choose between many variants), the perplexity can reach values over 1000, while on easy tasks, it is common to observe values below 100.
Another difficulty that arises when using perplexity as a measure of progress is when improvements are reported as percentual reductions. It can be seen that constant relative reduction of entropy results in variable reduction of perplexity. …
. …
References
BibTeX
@inproceedings{2011_EmpiricalEvaluationandCombinati, author = {Tomas Mikolov and Anoop Deoras and Stefan Kombrink and Lukas Burget and Jan Cernocky}, title = {Empirical Evaluation and Combination of Advanced Language Modeling Techniques}, booktitle = {Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011)}, pages = {605--608}, publisher = {ISCA}, year = {2011}, url = {http://www.isca-speech.org/archive/interspeech\_2011/i11\_0605.html}, }
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 EmpiricalEvaluationandCombinati | Anoop Deoras Stefan Kombrink Lukas Burget Jan Černocký Tomáš Mikolov | Empirical Evaluation and Combination of Advanced Language Modeling Techniques. | 2011 |