Memory-Augmented Neural Network (MANN)
A Memory-Augmented Neural Network (MANN) is an memory-based neural network that also includes external memory modules.
- AKA: External Memory-based Neural Network.
- Context:
- It can be trained by a Memory Augmented Neural Network Training System.
- ...
- Example(s):
- a Neural Turing Machine (NTM),
- a Neural Machine Translation (NMT) Network,
- a Hierarchical Attention Network,
- a Gated Convolutional Neural Network with Segment-level Attention Mechanism (SAM-GCNN),
- a Convolutional Neural Network with Segment-level Attention Mechanism (SAM-CNN),
- a Bidirectional Recurrent Neural Network with Attention Mechanism,
- a Sparse Access Memory Neural Network (SAM-ANN),
- a Transformer Neural Network.
- …
- Counter-Example(s):
- See: Artificial Neural Network, Neural Natural Language Translation, Attention Mechanism, Deep Learning Neural Network, Speech Recognition, Document Classification.
References
2018a
- (Collier & Beel, 2018) ⇒ Mark Collier, and Joeran Beel. (2018). "Implementing Neural Turing Machines" (PDF). In: Proceedings of 27th International Conference on Artificial Neural Networks (ICANN). ISBN:978-3-030-01424-7. DOI:10.1007/978-3-030-01424-7_10. arXiv:1807.08518
- QUOTE: MANNs defining attribute is the existence of an external memory unit. This contrasts with gated recurrent neural networks such as Long ShortTerm Memory Cells (LSTMs) [7] whose memory is an internal vector maintained over time. LSTMs have achieved state-of-the-art performance in many commercially important sequence learning tasks, such as handwriting recognition [2], machine translation [12] and speech recognition [3]. But, MANNs have been shown to outperform LSTMs on several artificial sequence learning tasks that require a large memory and/or complicated memory access patterns, for example memorization of long sequences and graph traversal [4, 5, 6, 11].
2018b
- (Shen et al., 2018) ⇒ Yu-Han Shen, Ke-Xin He, and Wei-Qiang Zhang. (2018). “SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring.” In: Proceedings of ISSPIT 2018. arXiv:1810.03986.
- QUOTE: So we propose a segment-level attention mechanism (SAM) to decide how much attention should be given based on the characteristics of segments. Here, a segment is comprised of several frames. In this paper, we mainly adopt three ways to improve the performance of our model:
- (1) We replace currently popular CNN with gated convolutional neural network to extract more temporal features of audios;
- (2) We propose a new segment-level attention mechanism to focus more on the audio segments with more energy;
- (3) We utilize model ensemble to enhance the classification capability of our model.
- We examine the following configurations:
- (1) CNN: Convolutional neural network as baseline system;
- (2) SAM-CNN: Convolutional neural network with our proposed segment-level attention mechanism;
- (3) GCNN: Gated convolutional neural network;
- (4) SAM-GCNN: Gated convolutional neural network with our proposed segment-level attention mechanism;
- (5) Ensemble: Gated convolutional neural network with our proposed segment-level attention mechanism and model ensemble.
2016a
- (Santoro et al., 2016) ⇒ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. (2016). “Meta-Learning with Memory-Augmented Neural Networks.” In: Proceedings of the 33rd International Conference on Machine Learning (ICML'16).
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs). (...)
The Neural Turing Machine is a fully differentiable implementation of a MANN. It consists of a controller, such as a feed-forward network or LSTM, which interacts with an external memory module using a number of read and write heads (Graves et al., 2014). Memory encoding and retrieval in a NTM external memory module is rapid, with vector representations being placed into or taken out of memory potentially every time-step. This ability makes the NTM a perfect candidate for meta-learning and low-shot prediction, as it is capable of both long-term storage via slow updates of its weights, and short-term storage via its external memory module. Thus, if a NTM can learn a general strategy for the types of representations it should place into memory and how it should later use these representations for predictions, then it may be able use its speed to make accurate predictions of data that it has only seen once.
Figure 1. Task structure. (a) Omniglot images (or x-values for regression), [math]\displaystyle{ x_t }[/math], are presented with time-offset labels (or function values), [math]\displaystyle{ y_{t−1} }[/math], to prevent the network from simply mapping the class labels to the output. From episode to episode, the classes to be presented in the episode, their associated labels, and the specific samples are all shuffled. (b) A successful strategy would involve the use of an external memory to store bound sample representation-class label information, which can then be retrieved at a later point for successful classification when a sample from an already-seen class is presented. Specifically, sample data [math]\displaystyle{ x_t }[/math] from a particular time step should be bound to the appropriate class label [math]\displaystyle{ y_t }[/math], which is presented in the subsequent time step. Later, when a sample from this same class is seen, it should retrieve this bound information from the external memory to make a prediction. Backpropagated error signals from this prediction step will then shape the weight updates from the earlier steps in order to promote this binding strategy.
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs). (...)
2016b
- (Yang et al., 2016) ⇒ Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. (2016). “Hierarchical Attention Networks for Document Classification.” In: Proceedings of the 2016_Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- QUOTE: The overall architecture of the Hierarchical Attention Network (HAN) is shown in Fig. 2. It consists of several parts: a word sequence encoder, a word-level attention layer, a sentence encoder and a sentence-level attention layer. (...)
Figure 2: Hierarchical Attention Network.
- QUOTE: The overall architecture of the Hierarchical Attention Network (HAN) is shown in Fig. 2. It consists of several parts: a word sequence encoder, a word-level attention layer, a sentence encoder and a sentence-level attention layer. (...)
2016c
- (Tilk & Alumae, 2016) ⇒ Ottokar Tilk, and Tanel Alumae. (2016). “Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.” In: Proceedings of Interspeech 2016. doi:10.21437/Interspeech.2016
- QUOTE: Our model is a bidirectional recurrent neural network (BRNN) [24] which enables it to make use of unfixed length contexts before and after the current position in text.
In the recurrent layers we use gated recurrent units (GRU) [26] that are well suited for capturing long range dependencies on multiple time scales. These units have similar benefits as LSTM units [27] while being simpler.
We incorporated an attention mechanism [25] into our model to further increase its capacity of finding relevant parts of the context for punctuation decisions. For example the model might focus on words that indicate a question, but may be relatively far from the current word, to nudge the model towards ending the sentence with a question mark instead of a period.
To fuse together the model state at current input word and the output from the attention mechanism we use a late fusion approach [28] adapted from LSTM to GRU. This allows the attention model output to directly interact with the recurrent layer state while not interfering with its memory.
Figure 1: Description of the model predicting punctuation [math]\displaystyle{ y_t }[/math] at time step [math]\displaystyle{ t }[/math] for the slot before the current input word [math]\displaystyle{ x_t }[/math].
- QUOTE: Our model is a bidirectional recurrent neural network (BRNN) [24] which enables it to make use of unfixed length contexts before and after the current position in text.