2018 BreakingtheSoftmaxBottleneckAHi
Jump to navigation
Jump to search
- (Yang, Dai et al., 2018) ⇒ Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. (2018). “Breaking the Softmax Bottleneck: A High-rank RNN Language Model.” In: Proceedings of 6th International Conference on Learning Representations (ICLR-2018).
Subject Headings: Softmax Layer.
Notes
Cited By
2018
- https://openreview.net/forum?id=HkwZSG-CZ
- REVIEW: Viewing language modeling as a matrix factorization problem, the authors argue that the low rank of word embeddings used by such models limits their expressivity and show that replacing the softmax in such models with a mixture of softmaxes provides an effective way of overcoming this bottleneck. This is an interesting and well-executed paper that provides potentially important insight. It would be good to at least mention prior work related to the language modeling as matrix factorization perspective (e.g. Levy & Goldberg, 2014).
- REVIEW: This paper uncovers a fundamental issue with large vocabularies and goes beyond just analyzing the issue by proposing a helpful method of addressing this.
- REVIEW: Language models are important components to many NLP tasks. The current state-of-the-art language models are based on recurrent neural networks which compute the probability of a word given all previous words using a softmax function over a linear function of the RNN's hidden state. This paper argues the softmax is not expressive enough and proposes to use a more flexible mixture of softmaxes. The use of a mixture of softmaxes is motivated from a theoretical point of view by translating language modeling into matrix factorization.
Quotes
Abstract
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2018 BreakingtheSoftmaxBottleneckAHi | William W. Cohen Ruslan Salakhutdinov Zhilin Yang Zihang Dai | Breaking the Softmax Bottleneck: A High-rank RNN Language Model | 2018 |