Tree-LSTM Unit Activation Function
A Tree-LSTM Unit Activation Function is a Long Short-Term Memory Unit-based Activation Function applied to Tree Structured Neural Networks.
- AKA: Tree-LSTM Activation Function, Tree LSTM.
- Context:
- It can (typically) be used in the activation of Tree-LSTM Neurons.
- Example(s):
chainer.functions.tree_lstm
,- ...
- …
- Counter-Example(s):
- a S-LSTM Unit Activation Function,
- a Softmax-based Activation Function,
- a Rectified-based Activation Function,
- a Heaviside Step Activation Function,
- a Ramp Function-based Activation Function,
- a Logistic Sigmoid-based Activation Function,
- a Hyperbolic Tangent-based Activation Function,
- a Gaussian-based Activation Function,
- a Softsign Activation Function,
- a Softshrink Activation Function,
- an Adaptive Piecewise Linear Activation Function,
- a Maxout Activation Function.
- See: Artificial Neural Network, Recurrent Neural Network (RNN), Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.
References
2018
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.tree_lstm.html Retrieved:2018-2-25
- QUOTE:
chainer.functions.tree_lstm(*inputs)
source TreeLSTM unit as an activation function. This function implements TreeLSTM units both for N-ary TreeLSTM and Child-Sum TreeLSTM. Let the children cell states [math]\displaystyle{ c_1,c_2,\cdots,c_N }[/math], and the incoming signal [math]\displaystyle{ x }[/math].
First, the incoming signal [math]\displaystyle{ x }[/math] is split into (3 + N) arrays [math]\displaystyle{ a,i,o,f_1,f_2,\cdots,f_N }[/math] of the same shapes along the second axis. It means that [math]\displaystyle{ x }[/math] ‘s second axis must have (3 + N) times of the length of each [math]\displaystyle{ c_n }[/math].
The splitted input signals are corresponding to:
- QUOTE:
- [math]\displaystyle{ a }[/math] : sources of cell input
- [math]\displaystyle{ i }[/math] : sources of input gate
- [math]\displaystyle{ o }[/math]: sources of output gate
- [math]\displaystyle{ f_n }[/math] : sources of forget gate for n-th ary
- Second, it computes outputs as:
[math]\displaystyle{ c=tanh(a)\;sigmoid(i)+c_1\;sigmoid(f1),+c_2\;sigmoid(f_2),+\cdots,+c_N\;sigmoid(f_N) }[/math],
[math]\displaystyle{ h=tanh(c)sigmoid(o) }[/math].
These are returned as a tuple of (N + 1) variables(...)
2016
- (Bowman et al., 2016) ⇒ Bowman, S. R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C. D., & Potts, C. (2016). A fast unified model for parsing and sentence understanding. arXiv preprint arXiv:1603.06021.
- ABSTRACT: Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree-structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25 times over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models
2015
- (Tai et al.,2015) ⇒ Tai, K. S., Socher, R., & Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.
- ABSTRACT: Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).