Memory-Augmented Neural Network (MANN)

A Memory-Augmented Neural Network (MANN) is a memory-based neural network that incorporates external memory modules to store and retrieve information independently from the network's parameters.

AKA: External Memory-based Neural Network, Neural Network with External Memory.
Context:
- It can typically utilize memory-augmented neural network external memory to store information that can be accessed and modified during inference, enabling more flexible and powerful computation compared to networks with only internal memory.
- It can typically perform memory-augmented neural network read operations and memory-augmented neural network write operations on its memory-augmented neural network memory content through differentiable memory-augmented neural network controller mechanisms.
- It can typically implement memory-augmented neural network addressing mechanisms to selectively access specific locations in memory-augmented neural network memory matrix based on content similarity or positional information.
- It can typically separate memory-augmented neural network computation from memory-augmented neural network storage, allowing the network to maintain much larger effective memory than what could be encoded in its parameters.
- It can often outperform internal memory-based neural networks on memory-intensive tasks that require storing and retrieving complex patterns over long time periods.
- It can often facilitate memory-augmented neural network meta-learning by quickly encoding new information in external memory without requiring parameter updates.
- It can often demonstrate superior memory-augmented neural network few-shot learning capability by leveraging external memory to store and retrieve examples from new classes.
- It can range from being a Simple Memory-Augmented Neural Network (MANN) to being a Complex Memory-Augmented Neural Network (MANN), depending on its memory-augmented neural network memory architecture complexity.
- It can range from being a Content-Addressable Memory-Augmented Neural Network (MANN) to being a Location-Addressable Memory-Augmented Neural Network (MANN), depending on its memory-augmented neural network addressing mechanism.
- It can range from being a Fixed-Size Memory-Augmented Neural Network (MANN) to being a Dynamic-Size Memory-Augmented Neural Network (MANN), depending on its memory-augmented neural network memory capacity adaptability.
- It can be memory-augmented neural network controller-driven, where a neural network component determines how to interact with the external memory.
- It can incorporate memory-augmented neural network attention mechanisms to selectively focus on different parts of the external memory when processing inputs.
- It can mitigate the memory-augmented neural network catastrophic forgetting problem by storing previously learned information in external memory rather than solely in network weights.
- It can be trained using Memory-Augmented Neural Network Training System that enables end-to-end differentiable learning despite discrete memory operations.
- ...
Examples:
- Memory-Augmented Neural Network Architectures by memory access pattern, such as:
  - Neural Turing Machine (NTM), implementing differentiable memory-augmented neural network memory access through attentional mechanisms with both content and location-based addressing for general-purpose computation.
  - Differentiable Neural Computer (DNC), extending memory-augmented neural network external memory with dynamic memory allocation and temporal link tracking for complex algorithmic tasks.
  - Memory Networks, using explicit memory-augmented neural network memory representations with specialized encoding, addressing, and reading processes for question answering tasks.
  - Sparse Access Memory Neural Network (SAM-ANN), implementing efficient memory-augmented neural network sparse memory access to scale to larger memory sizes while maintaining computational efficiency.
- Memory-Augmented Neural Network Applications by domain, such as:
  - Sequence Learning Memory-Augmented Neural Networks, such as:
    - Neural Machine Translation (NMT) Network, utilizing memory-augmented neural network attention mechanisms to remember and selectively focus on source sentence representations during translation.
    - Transformer Neural Network, implementing memory-augmented neural network self-attention mechanisms to process entire sequences in parallel while maintaining positional information.
  - Natural Language Processing Memory-Augmented Neural Networks, such as:
    - Hierarchical Attention Network, employing memory-augmented neural network word-level attention and memory-augmented neural network sentence-level attention mechanisms for document classification.
    - Bidirectional Recurrent Neural Network with Attention Mechanism, combining bidirectional context processing with attention for tasks like punctuation restoration and speech recognition.
  - Audio Processing Memory-Augmented Neural Networks, such as:
    - Gated Convolutional Neural Network with Segment-level Attention Mechanism (SAM-GCNN), focusing on audio segments through memory-augmented neural network segment-level attention mechanisms for home activity monitoring.
    - Convolutional Neural Network with Segment-level Attention Mechanism (SAM-CNN), implementing memory-augmented neural network attention mechanisms over audio segments without gating.
- Memory-Augmented Neural Network Learning Paradigms, such as:
  - One-Shot Learning Memory-Augmented Neural Network, utilizing memory-augmented neural network external memory to rapidly learn from single examples of new classes.
  - Lifelong Learning Memory-Augmented Neural Network, leveraging external memory to continually accumulate knowledge without catastrophic forgetting.
  - Meta-Learning Memory-Augmented Neural Network, implementing fast adaptation to new tasks by storing task-specific information in external memory.
- ...
Counter-Examples:
- Internal Memory-based Neural Networks, which encode all memory within network parameters rather than using separate external memory structures that memory-augmented neural networks provide, such as:
  - Long Short-Term Memory Network, which maintains internal memory cells and gating mechanisms but lacks the explicit, addressable external memory of memory-augmented neural networks.
  - Gated Recurrent Unit Network, which simplifies the internal memory architecture of LSTMs but still relies solely on parameters for information storage unlike memory-augmented neural networks.
- Feedforward Neural Network, which lacks any form of memory structure and can only process current inputs without reference to past information, unlike memory-augmented neural networks with external memory.
- Standard Recurrent Neural Network, which maintains only simple hidden state as implicit memory without the sophisticated addressing and storage mechanisms of memory-augmented neural networks.
- Convolutional Neural Network, which processes spatial information through filter operations rather than utilizing addressable memory structures like memory-augmented neural networks.
- Radial Basis Function Neural Network, which uses distance metrics in feature space rather than explicit memory structures for pattern recognition, unlike memory-augmented neural networks.
See: Artificial Neural Network, External Memory System, Neural Natural Language Translation, Attention Mechanism, Deep Learning Neural Network, Speech Recognition, Document Classification, Differentiable Neural Computer, One-Shot Learning.

References

2018a

(Collier & Beel, 2018) ⇒ Mark Collier, and Joeran Beel. (2018). "Implementing Neural Turing Machines" (PDF). In: Proceedings of 27th International Conference on Artificial Neural Networks (ICANN). ISBN:978-3-030-01424-7. DOI:10.1007/978-3-030-01424-7_10. arXiv:1807.08518
- QUOTE: MANNs defining attribute is the existence of an external memory unit. This contrasts with gated recurrent neural networks such as Long ShortTerm Memory Cells (LSTMs) [7] whose memory is an internal vector maintained over time. LSTMs have achieved state-of-the-art performance in many commercially important sequence learning tasks, such as handwriting recognition [2], machine translation [12] and speech recognition [3]. But, MANNs have been shown to outperform LSTMs on several artificial sequence learning tasks that require a large memory and/or complicated memory access patterns, for example memorization of long sequences and graph traversal [4, 5, 6, 11].

2018b

(Shen et al., 2018) ⇒ Yu-Han Shen, Ke-Xin He, and Wei-Qiang Zhang. (2018). “SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring.” In: Proceedings of ISSPIT 2018. arXiv:1810.03986.
- QUOTE: So we propose a segment-level attention mechanism (SAM) to decide how much attention should be given based on the characteristics of segments. Here, a segment is comprised of several frames. In this paper, we mainly adopt three ways to improve the performance of our model:

(1) We replace currently popular CNN with gated convolutional neural network to extract more temporal features of audios;

(2) We propose a new segment-level attention mechanism to focus more on the audio segments with more energy;

(3) We utilize model ensemble to enhance the classification capability of our model.

We examine the following configurations:

(1) CNN: Convolutional neural network as baseline system;

(2) SAM-CNN: Convolutional neural network with our proposed segment-level attention mechanism;

(3) GCNN: Gated convolutional neural network;

(4) SAM-GCNN: Gated convolutional neural network with our proposed segment-level attention mechanism;

(5) Ensemble: Gated convolutional neural network with our proposed segment-level attention mechanism and model ensemble.

2016a

(Santoro et al., 2016) ⇒ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. (2016). “Meta-Learning with Memory-Augmented Neural Networks.” In: Proceedings of the 33rd International Conference on Machine Learning (ICML'16).
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs). (...)
  The Neural Turing Machine is a fully differentiable implementation of a MANN. It consists of a controller, such as a feed-forward network or LSTM, which interacts with an external memory module using a number of read and write heads (Graves et al., 2014). Memory encoding and retrieval in a NTM external memory module is rapid, with vector representations being placed into or taken out of memory potentially every time-step. This ability makes the NTM a perfect candidate for meta-learning and low-shot prediction, as it is capable of both long-term storage via slow updates of its weights, and short-term storage via its external memory module. Thus, if a NTM can learn a general strategy for the types of representations it should place into memory and how it should later use these representations for predictions, then it may be able use its speed to make accurate predictions of data that it has only seen once.
  Figure 1. Task structure. (a) Omniglot images (or x-values for regression), [math]\displaystyle{ x_t }[/math], are presented with time-offset labels (or function values), [math]\displaystyle{ y_{t−1} }[/math], to prevent the network from simply mapping the class labels to the output. From episode to episode, the classes to be presented in the episode, their associated labels, and the specific samples are all shuffled. (b) A successful strategy would involve the use of an external memory to store bound sample representation-class label information, which can then be retrieved at a later point for successful classification when a sample from an already-seen class is presented. Specifically, sample data [math]\displaystyle{ x_t }[/math] from a particular time step should be bound to the appropriate class label [math]\displaystyle{ y_t }[/math], which is presented in the subsequent time step. Later, when a sample from this same class is seen, it should retrieve this bound information from the external memory to make a prediction. Backpropagated error signals from this prediction step will then shape the weight updates from the earlier steps in order to promote this binding strategy.

2016b

(Yang et al., 2016) ⇒ Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. (2016). “Hierarchical Attention Networks for Document Classification.” In: Proceedings of the 2016_Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- QUOTE: The overall architecture of the Hierarchical Attention Network (HAN) is shown in Fig. 2. It consists of several parts: a word sequence encoder, a word-level attention layer, a sentence encoder and a sentence-level attention layer. (...)
  Figure 2: Hierarchical Attention Network.

2016c

(Tilk & Alumae, 2016) ⇒ Ottokar Tilk, and Tanel Alumae. (2016). “Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.” In: Proceedings of Interspeech 2016. doi:10.21437/Interspeech.2016
- QUOTE: Our model is a bidirectional recurrent neural network (BRNN) [24] which enables it to make use of unfixed length contexts before and after the current position in text.
  In the recurrent layers we use gated recurrent units (GRU) [26] that are well suited for capturing long range dependencies on multiple time scales. These units have similar benefits as LSTM units [27] while being simpler.
  We incorporated an attention mechanism [25] into our model to further increase its capacity of finding relevant parts of the context for punctuation decisions. For example the model might focus on words that indicate a question, but may be relatively far from the current word, to nudge the model towards ending the sentence with a question mark instead of a period.
  To fuse together the model state at current input word and the output from the attention mechanism we use a late fusion approach [28] adapted from LSTM to GRU. This allows the attention model output to directly interact with the recurrent layer state while not interfering with its memory.
  Figure 1: Description of the model predicting punctuation [math]\displaystyle{ y_t }[/math] at time step [math]\displaystyle{ t }[/math] for the slot before the current input word [math]\displaystyle{ x_t }[/math].