S-LSTM Unit Activation Function
A S-LSTM Unit Activation Function is a Long Short-Term Memory Unit-based Activation Function applied to Binary Trees.
- AKA: S-LSTM Activation Function.
- Context:
- It can (typically) be used in the activation of S-LSTM Neurons.
- Example(s):
- Counter-Example(s):
- a Tree-LSTM Unit Activation Function,
- a Softmax-based Activation Function,
- a Rectified-based Activation Function,
- a Heaviside Step Activation Function,
- a Ramp Function-based Activation Function,
- a Logistic Sigmoid-based Activation Function,
- a Hyperbolic Tangent-based Activation Function,
- a Gaussian-based Activation Function,
- a Softsign Activation Function,
- a Softshrink Activation Function,
- an Adaptive Piecewise Linear Activation Function,
- a Maxout Activation Function.
- See: Artificial Neural Network, Recurrent Neural Network (RNN), Artificial Neuron, Neural Network Topology, Neural Network Layer, Neural Network Learning Rate.
References
2018
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.functions.lstm.html Retrieved:2018-2-25
- QUOTE:
chainer.functions.slstm(c_prev1, c_prev2, x1, x2)
source S-LSTM units as an activation function.
This function implements S-LSTM unit. It is an extension of LSTM unit applied to tree structures. The function is applied to binary trees. Each node has two child nodes. It gets four arguments, previous cell states
c_prev1
andc_prev2
, and input arraysx1
andx2
.First both input arrays
x1
andx2
are split into eight arrays [math]\displaystyle{ a_1,i_1,f_1,o_1 }[/math], and [math]\displaystyle{ a_2,i_2,f_2,o_2 }[/math]. They have the same shape along the second axis. It means thatx1
andx2
's second axis must have 4 times the length ofc_prev1
andc_prev2
.The split input arrays are corresponding to:
- QUOTE:
- [math]\displaystyle{ a_i }[/math] : sources of cell input
- [math]\displaystyle{ i_i }[/math] : sources of input gate
- [math]\displaystyle{ f_i }[/math] : sources of forget gate
- [math]\displaystyle{ o_i }[/math] : sources of output gate
- It computes the updated cell state
c
and the outgoing signalh
as:[math]\displaystyle{ c=\tanh(a_1+a_2)\sigma(i_1+i_2)+c_{prev1}\sigma(f_1)+c_{prev2}\sigma(f_2), }[/math]
[math]\displaystyle{ h=tanh(c)\sigma(o_1+o_2) }[/math], where [math]\displaystyle{ \sigma }[/math] is the elementwise sigmoid function. The function returns
c
andh
as a tuple (...)
2015
- (Zhu et al., 2015) ⇒ Zhu, X., Sobihani, P., & Guo, H. (2015, June). Long short-term memory over recursive structures. In: Proceedings of The International Conference on Machine Learning (pp. 1604-1612) arXiv:1503.04881 .
- ABSTRACT: The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process. We call the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies, e.g., language or image parse structures. We leverage the models for semantic composition to understand the meaning of text, a fundamental problem in natural language understanding, and show that it outperforms a state-of-the-art recursive model by replacing its composition layers with the S-LSTM memory blocks. We also show that utilizing the given structures is helpful in achieving a performance better than that without considering the structures.