2018 MemoryArchitecturesinRecurrentN
- (Yogatama et al., 2018) ⇒ Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, and Phil Blunsom. (2018). “Memory Architectures in Recurrent Neural Network Language Models.” In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018).
Subject Headings:
Notes
Cited By
Quotes
Abstract
We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models. Our experiments on the Penn Treebank and Wikitext-2 datasets show that stack-based memory architectures consistently achieve the best performance in terms of held out perplexity. We also propose a generalization to existing continuous stack models (Joulin & Mikolov, 2015; Grefenstette et al., 2015) to allow a variable number of pop operations more naturally that further improves performance. We further evaluate these language models in terms of their ability to capture non-local syntactic dependencies on a subject-verb agreement dataset (Linzen et al., 2016) and establish new state of the art results using memory augmented language models. Our results demonstrate the value of stack-structured memory for explaining the distribution of words in natural language, in line with linguistic theories claiming a context-free backbone for natural language.
1. Introduction
...
* We compare how a recurrent neural network uses a stack memory, a sequential memory cell (i.e., an LSTM memory cell), and a random access memory (i.e., an attention mechanism) for language modeling. Experiments on the Penn Treebank and Wikitext-2 datasets (§3.2) show that both the stack model and the attention-based model outperform the LSTM model with a comparable (or even larger) number of parameters, and that the stack model eliminates the need to tune window size to achieve the best perplexity.
…
2. Model
Random access memory. One common approach to retrieve information from the distant past more reliably is to augment the model with a random access memory block via an attention based method. In this model, we consider the previous $K$ states as the memory block, and construct a memory vector $\mathbf{m}_t$ by a weighted combination of these states:
[math]\displaystyle{ \mathbf{m}_t = \displaystyle \sum_{i=t−K}^{t−1} a_i\mathbf{h}_i \quad }[/math], where [math]\displaystyle{ \quad a_i \propto \exp\left(\mathbf{w}_{m,i}\mathbf{h}_i + \mathbf{w}_{m,h} \mathbf{h}_t\right) }[/math]
Such method can be improved further by partitioning $\mathbf{h}$ into a key, value, and predict subvectors (Daniluk et al., 2017)
3. Experiments
4. Conclusion
Acknowledgements
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2018 MemoryArchitecturesinRecurrentN | Chris Dyer Phil Blunsom Wang Ling Dani Yogatama Yishu Miao Gabor Melis Adhiguna Kuncoro | Memory Architectures in Recurrent Neural Network Language Models | 2018 |