2018 MemoryArchitecturesinRecurrentN

(Yogatama et al., 2018) ⇒ Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, and Phil Blunsom. (2018). “Memory Architectures in Recurrent Neural Network Language Models.” In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018).

Subject Headings:

Notes

Cited By

http://scholar.google.com/scholar?q=%222018%22+Memory+Architectures+in+Recurrent+Neural+Network+Language+Models

Quotes

Abstract

We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models. Our experiments on the Penn Treebank and Wikitext-2 datasets show that stack-based memory architectures consistently achieve the best performance in terms of held out perplexity. We also propose a generalization to existing continuous stack models (Joulin & Mikolov, 2015; Grefenstette et al., 2015) to allow a variable number of pop operations more naturally that further improves performance. We further evaluate these language models in terms of their ability to capture non-local syntactic dependencies on a subject-verb agreement dataset (Linzen et al., 2016) and establish new state of the art results using memory augmented language models. Our results demonstrate the value of stack-structured memory for explaining the distribution of words in natural language, in line with linguistic theories claiming a context-free backbone for natural language.

1. Introduction

...

* We compare how a recurrent neural network uses a stack memory, a sequential memory cell (i.e., an LSTM memory cell), and a random access memory (i.e., an attention mechanism) for language modeling. Experiments on the Penn Treebank and Wikitext-2 datasets (§3.2) show that both the stack model and the attention-based model outperform the LSTM model with a comparable (or even larger) number of parameters, and that the stack model eliminates the need to tune window size to achieve the best perplexity.

…

2. Model

Random access memory. One common approach to retrieve information from the distant past more reliably is to augment the model with a random access memory block via an attention based method. In this model, we consider the previous $K$ states as the memory block, and construct a memory vector $\mathbf{m}_t$ by a weighted combination of these states:

[math]\displaystyle{ \mathbf{m}_t = \displaystyle \sum_{i=t−K}^{t−1} a_i\mathbf{h}_i \quad }[/math], where [math]\displaystyle{ \quad a_i \propto \exp\left(\mathbf{w}_{m,i}\mathbf{h}_i + \mathbf{w}_{m,h} \mathbf{h}_t\right) }[/math]

Such method can be improved further by partitioning $\mathbf{h}$ into a key, value, and predict subvectors (Daniluk et al., 2017)

3. Experiments

4. Conclusion

Acknowledgements

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2018 MemoryArchitecturesinRecurrentN	Chris Dyer Phil Blunsom Wang Ling Dani Yogatama Yishu Miao Gabor Melis Adhiguna Kuncoro			Memory Architectures in Recurrent Neural Network Language Models						2018