2017 RegularizingandOptimizingLSTMLa

From GM-RKB

Jump to navigation Jump to search

(Merity et al., 2017) ⇒ Stephen Merity, Nitish Shirish Keskar, and Richard Socher. (2017). “Regularizing and Optimizing LSTM Language Models.” In: arXiv preprint arXiv:1708.02182.

Subject Headings: Neural Word-level Language Modeling, LSTM Regularization, Weight-Dropped LSTM.

Notes

Cited By

http://scholar.google.com/scholar?q=%222017%22+Regularizing+and+Optimizing+LSTM+Language+Models

Quotes

Abstract

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2017 RegularizingandOptimizingLSTMLa	Richard Socher Stephen Merity Nitish Shirish Keskar			Regularizing and Optimizing LSTM Language Models

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2017_RegularizingandOptimizingLSTMLa&oldid=691836"

Facts

... more about "2017 RegularizingandOptimizingLSTMLa"

Stephen Merity +, Nitish Shirish Keskar + and Richard Socher +

Regularizing and Optimizing LSTM Language Models +

2017 +