2018 BreakingtheSoftmaxBottleneckAHi

Subject Headings: Softmax Layer.

Notes

https://openreview.net/forum?id=HkwZSG-CZ
- REVIEW: Viewing language modeling as a matrix factorization problem, the authors argue that the low rank of word embeddings used by such models limits their expressivity and show that replacing the softmax in such models with a mixture of softmaxes provides an effective way of overcoming this bottleneck. This is an interesting and well-executed paper that provides potentially important insight. It would be good to at least mention prior work related to the language modeling as matrix factorization perspective (e.g. Levy & Goldberg, 2014).
- REVIEW: This paper uncovers a fundamental issue with large vocabularies and goes beyond just analyzing the issue by proposing a helpful method of addressing this.
- REVIEW: Language models are important components to many NLP tasks. The current state-of-the-art language models are based on recurrent neural networks which compute the probability of a word given all previous words using a softmax function over a linear function of the RNN's hidden state. This paper argues the softmax is not expressive enough and proposes to use a more flexible mixture of softmaxes. The use of a mixture of softmaxes is motivated from a theoretical point of view by translating language modeling into matrix factorization.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2018 BreakingtheSoftmaxBottleneckAHi	William W. Cohen Ruslan Salakhutdinov Zhilin Yang Zihang Dai			Breaking the Softmax Bottleneck: A High-rank RNN Language Model						2018