Memory Augmented Neural Network Training System
A Memory Augmented Neural Network (MANN) Training System is an Neural Network Training System that used to train a Memory Augmented Neural Network.
- AKA: Memory Augmented Neural Network System.
- Context:
- It can solve a Memory Augmented Neural Network Training Task by implementing a Memory Augmented Neural Network Training Algorithm.
- Example(s):
- Counter-Example(s):
- See: Artificial Neural Network, Neural Natural Language Translation, Attention Mechanism, Deep Learning Neural Network, Speech Recognition, Document Classification.
References
2018
- (Shen et al., 2018) ⇒ Yu-Han Shen, Ke-Xin He, and Wei-Qiang Zhang. (2018). “SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring.” In: Proceedings of ISSPIT 2018. arXiv:1810.03986.
2016a
- (Santoro et al., 2016) ⇒ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. (2016). “Meta-Learning with Memory-Augmented Neural Networks.” In: Proceedings of the 33rd International Conference on Machine Learning (ICML'16).
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs). (...)
The Neural Turing Machine is a fully differentiable implementation of a MANN. It consists of a controller, such as a feed-forward network or LSTM, which interacts with an external memory module using a number of read and write heads (Graves et al., 2014). Memory encoding and retrieval in a NTM external memory module is rapid, with vector representations being placed into or taken out of memory potentially every time-step. This ability makes the NTM a perfect candidate for meta-learning and low-shot prediction, as it is capable of both long-term storage via slow updates of its weights, and short-term storage via its external memory module. Thus, if a NTM can learn a general strategy for the types of representations it should place into memory and how it should later use these representations for predictions, then it may be able use its speed to make accurate predictions of data that it has only seen once.
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs). (...)
2016b
- (Yang et al., 2016) ⇒ Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. (2016). “Hierarchical Attention Networks for Document Classification.” In: Proceedings of the 2016_Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- QUOTE: The overall architecture of the Hierarchical Attention Network (HAN) is shown in Fig. 2. It consists of several parts: a word sequence encoder, a word-level attention layer, a sentence encoder and a sentence-level attention layer. (...)
Figure 2: Hierarchical Attention Network.
- QUOTE: The overall architecture of the Hierarchical Attention Network (HAN) is shown in Fig. 2. It consists of several parts: a word sequence encoder, a word-level attention layer, a sentence encoder and a sentence-level attention layer. (...)
2016c
- (Tilk & Alumae, 2016) ⇒ Ottokar Tilk, and Tanel Alumae. (2016). “Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.” In: Proceedings of Interspeech 2016. doi:10.21437/Interspeech.2016
- QUOTE: Our model is a bidirectional recurrent neural network (BRNN) [24] which enables it to make use of unfixed length contexts before and after the current position in text.
In the recurrent layers we use gated recurrent units (GRU) [26] that are well suited for capturing long range dependencies on multiple time scales. These units have similar benefits as LSTM units [27] while being simpler.
We incorporated an attention mechanism [25] into our model to further increase its capacity of finding relevant parts of the context for punctuation decisions. For example the model might focus on words that indicate a question, but may be relatively far from the current word, to nudge the model towards ending the sentence with a question mark instead of a period.
To fuse together the model state at current input word and the output from the attention mechanism we use a late fusion approach [28] adapted from LSTM to GRU. This allows the attention model output to directly interact with the recurrent layer state while not interfering with its memory.
Figure 1: Description of the model predicting punctuation [math]\displaystyle{ y_t }[/math] at time step [math]\displaystyle{ t }[/math] for the slot before the current input word [math]\displaystyle{ x_t }[/math].
- QUOTE: Our model is a bidirectional recurrent neural network (BRNN) [24] which enables it to make use of unfixed length contexts before and after the current position in text.