Stacked Autoencoding Neural Network

AKA: Stacked Autocoder.
Context:
- It can (often) use stacking to build deep architectures that learn hierarchical feature representations.
- It can (often) pre-train each layer as an autoencoder, then fine-tune the entire network using backpropagation.
- It can be employed in unsupervised learning tasks for feature extraction and dimensionality reduction.
- It can leverage denoising autoencoders in each layer to enhance the robustness of learned features.
- It can range from a simple two-layer network to deep architectures with dozens of layers.
- It can improve the performance of supervised learning tasks by using the learned features as input to a classifier.
- ...
Example(s):
- A Stacked Denoising Autoencoding Network.
- A Deep Belief Network using Restricted Boltzmann Machines for layer-wise pre-training followed by fine-tuning as a stacked autoencoder.
- …
Counter-Example(s):
See: Encoder-Decoder Neural Network, Deep Neural Network, Neural Network Training System, Natural Language Processing System, Stacked Neural Network, Recurrent Neural Network, Convolutional Neural Network, .

References

(Stack Exchange, 2014) ⇒ "What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders?" (Answer)
- QUOTE: Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this:

(UFLDL, 2011) ⇒ http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders#Overview
- QUOTE: ... A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let [math]\displaystyle{ W^{(k, 1)}, W^{(k, 2)}, b^{(k, 1)}, b^{(k, 2)} }[/math] denote the parameters [math]\displaystyle{ W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)} }[/math] for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order: [math]\displaystyle{ \begin{align} a^{(l)} = f(z^{(l)}) \\ z^{(l + 1)} = W^{(l, 1)}a^{(l)} + b^{(l, 1)} \end{align} }[/math] The decoding step is given by running the decoding stack of each autoencoder in reverse order: [math]\displaystyle{ \begin{align} a^{(n + l)} = f(z^{(n + l)}) \\ z^{(n + l + 1)} = W^{(n - l, 2)}a^{(n + l)} + b^{(n - l, 2)} \end{align} }[/math] The information of interest is contained within [math]\displaystyle{ a^{(n)} }[/math], which is the activation of the deepest layer of hidden units. This vector gives us a representation of the input in terms of higher-order features.
  The features from the stacked autoencoder can be used for classification problems by feeding [math]\displaystyle{ a(n) }[/math] to a softmax classifier.