Stacked Autoencoding Neural Network
(Redirected from stacked autoencoder)
Jump to navigation
Jump to search
A Stacked Autoencoding Neural Network is a multi-layer feedforward neural network consisting of layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer.
- AKA: Stacked Autocoder.
- Context:
- It can (often) use stacking to build deep architectures that learn hierarchical feature representations.
- It can (often) pre-train each layer as an autoencoder, then fine-tune the entire network using backpropagation.
- It can be employed in unsupervised learning tasks for feature extraction and dimensionality reduction.
- It can leverage denoising autoencoders in each layer to enhance the robustness of learned features.
- It can range from a simple two-layer network to deep architectures with dozens of layers.
- It can improve the performance of supervised learning tasks by using the learned features as input to a classifier.
- ...
- Example(s):
- A Stacked Denoising Autoencoding Network.
- A Deep Belief Network using Restricted Boltzmann Machines for layer-wise pre-training followed by fine-tuning as a stacked autoencoder.
- …
- Counter-Example(s):
- See: Encoder-Decoder Neural Network, Deep Neural Network, Neural Network Training System, Natural Language Processing System, Stacked Neural Network, Recurrent Neural Network, Convolutional Neural Network, .
References
2014
- (Stack Exchange, 2014) ⇒ "What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders?" (Answer)
- QUOTE: Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this:
- QUOTE: Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this:
2011
- (UFLDL, 2011) ⇒ http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders#Overview
- QUOTE: ... A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let [math]\displaystyle{ W^{(k, 1)}, W^{(k, 2)}, b^{(k, 1)}, b^{(k, 2)} }[/math] denote the parameters [math]\displaystyle{ W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)} }[/math] for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order: [math]\displaystyle{
\begin{align}
a^{(l)} = f(z^{(l)}) \\
z^{(l + 1)} = W^{(l, 1)}a^{(l)} + b^{(l, 1)}
\end{align}
}[/math] The decoding step is given by running the decoding stack of each autoencoder in reverse order: [math]\displaystyle{
\begin{align}
a^{(n + l)} = f(z^{(n + l)}) \\
z^{(n + l + 1)} = W^{(n - l, 2)}a^{(n + l)} + b^{(n - l, 2)}
\end{align}
}[/math] The information of interest is contained within [math]\displaystyle{ a^{(n)} }[/math], which is the activation of the deepest layer of hidden units. This vector gives us a representation of the input in terms of higher-order features.
The features from the stacked autoencoder can be used for classification problems by feeding [math]\displaystyle{ a(n) }[/math] to a softmax classifier.
- QUOTE: ... A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let [math]\displaystyle{ W^{(k, 1)}, W^{(k, 2)}, b^{(k, 1)}, b^{(k, 2)} }[/math] denote the parameters [math]\displaystyle{ W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)} }[/math] for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order: [math]\displaystyle{
\begin{align}
a^{(l)} = f(z^{(l)}) \\
z^{(l + 1)} = W^{(l, 1)}a^{(l)} + b^{(l, 1)}
\end{align}
}[/math] The decoding step is given by running the decoding stack of each autoencoder in reverse order: [math]\displaystyle{
\begin{align}
a^{(n + l)} = f(z^{(n + l)}) \\
z^{(n + l + 1)} = W^{(n - l, 2)}a^{(n + l)} + b^{(n - l, 2)}
\end{align}
}[/math] The information of interest is contained within [math]\displaystyle{ a^{(n)} }[/math], which is the activation of the deepest layer of hidden units. This vector gives us a representation of the input in terms of higher-order features.
2009
- (Larochelle, 2009) ⇒ Hugo Larochelle (2009). http://www.dmi.usherb.ca/~larocheh/projects_deep_learning.html
- QUOTE: In Extracting and Composing Robust Features with Denoising Autoencoders, Pascal Vincent, Yoshua Bengio, Pierre-Antoine Manzagol and myself designed the denoising autoencoder, which outperforms both the regular autoencoder and the RBM as a pre-training module.
2008
- (Vincent et al., 2008) ⇒ Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. (2008). “Extracting and Composing Robust Features with Denoising Autoencoders.” In: Proceedings of the 25th International Conference on Machine learning (ICML 2008).