Internal Memory-based Neural Network
An Internal Memory-based Neural Network is an memory-based neural network that only includes internal memory blocks.
- Example(s):
- a Recurrent Neural Network with memory block-based nodes (such as an LSTM network or a GRU neural network).
- …
- Counter-Example(s):
- An External Memory-based Neural Network, such as:
- a Neural Turing Machine (NTM),
- a Neural Machine Translation (NMT) Network,
- a Hierarchical Attention Network,
- a Gated Convolutional Neural Network with Segment-level Attention Mechanism (SAM-GCNN),
- a Convolutional Neural Network with Segment-level Attention Mechanism (SAM-CNN),
- a Recurrent Neural Network with Attention Mechanism.
- An External Memory-based Neural Network, such as:
- See: Artificial Neural Network, Neural Natural Language Translation, Attention Mechanism, Deep Learning Neural Network, Speech Recognition, Document Classification.
References
2016
- (Santoro et al., 2016) ⇒ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. (2016). “Meta-Learning with Memory-Augmented Neural Networks.” In: Proceedings of the 33rd International Conference on Machine Learning (ICML'16).
- QUOTE: And so, in this paper we revisit the meta-learning problem and setup from the perspective of a highly capable memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs).
...Our approach combines the best of two worlds: the ability to slowly learn an abstract method for obtaining useful representations of raw data, via gradient descent, and the ability to rapidly bind never-beforeseen information after a single presentation, via an external memory module. The combination supports robust meta-learning, extending the range of problems to which deep learning can be effectively applied.
Figure 1. Task structure. (a) Omniglot images (or x-values for regression), [math]\displaystyle{ x_t }[/math], are presented with time-offset labels (or function values), [math]\displaystyle{ y_{t−1} }[/math], to prevent the network from simply mapping the class labels to the output. From episode to episode, the classes to be presented in the episode, their associated labels, and the specific samples are all shuffled. (b) A successful strategy would involve the use of an external memory to store bound sample representation-class label information, which can then be retrieved at a later point for successful classification when a sample from an already-seen class is presented. Specifically, sample data [math]\displaystyle{ x_t }[/math] from a particular time step should be bound to the appropriate class label [math]\displaystyle{ y_t }[/math], which is presented in the subsequent time step. Later, when a sample from this same class is seen, it should retrieve this bound information from the external memory to make a prediction. Backpropagated error signals from this prediction step will then shape the weight updates from the earlier steps in order to promote this binding strategy.
- QUOTE: And so, in this paper we revisit the meta-learning problem and setup from the perspective of a highly capable memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs).