Stacked Denoising Autoencoding Algorithm
A Stacked Denoising Autoencoding Algorithm is a stacked learning algorithm that is an autoencoding algorithm (composed of multiple layers of sparse autoencoders in which the neural network later outputs are wired to the inputs of the successive layer).
- AKA: Stacked Auto-Encoder.
- Example(s):
- Counter-Example(s):
- See: Restricted Boltzmann Machine, Multilayer Neural Network.
References
2011
- http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
- QUOTE: The greedy layerwise approach for pretraining a deep network works by training each layer in turn. In this page, you will find out how autoencoders can be "stacked" in a greedy layerwise fashion for pretraining (initializing) the weights of a deep network.
A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let [math]\displaystyle{ W^{(k, 1)}, W^{(k, 2)}, b^{(k, 1)}, b^{(k, 2)} }[/math] denote the parameters [math]\displaystyle{ W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)} }[/math] for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order:
- QUOTE: The greedy layerwise approach for pretraining a deep network works by training each layer in turn. In this page, you will find out how autoencoders can be "stacked" in a greedy layerwise fashion for pretraining (initializing) the weights of a deep network.
- (Glorot et al., 2011a) ⇒ Xavier Glorot, Antoine Bordes, and Yoshua Bengio. (2011). “Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach." In: Proceedings of the 28th International Conference on Machine Learning (ICML-11).
- QUOTE: The basic framework for our models is the Stacked Denoising Auto-encoder (Vincent et al., 2008). An auto-encoder is comprised of an encoder function [math]\displaystyle{ h(\cdot) }[/math] and a decoder function [math]\displaystyle{ g(\cdot) }[/math], typically with the dimension of [math]\displaystyle{ h(\cdot) }[/math] smaller than that of its argument. The reconstruction of input x is given by r (x) = g (h (x)), and auto-encoders are typically trained to minimize a form of reconstruction error loss (x; r (x)). Examples of reconstruction error include the squared error, or like here, when the elements of x or r (x) can be considered as probabilities of a discrete event, the Kullback-Domain Adaptation for Sentiment Classification with Deep Learning Liebler divergence between elements of x and elements of r (x). When the encoder and decoder are linear and the reconstruction error is quadratic, one recovers in h (x) the space of the principal components (PCA) of x. Once an auto-encoder has been trained, one can stack another auto-encoder on top of it, by training a second one which sees the encoded output of the first one as its training data. Stacked auto-encoders were one of the first methods for building deep architectures (Bengio et al., 2006), along with Restricted Boltzmann Machines (RBMs) (Hinton et al., 2006). Once a stack of auto-encoders or RBMs has been trained, their parameters describe multiple levels of representation for x and can be used to initialize a supervised deep neural network (Bengio, 2009) or directly feed a classifier, as we do in this paper.
An interesting alternative to the ordinary autoencoder is the Denoising Auto-encoder (Vincent et al., 2008) or DAE, in which the input vector x is stochastically corrupted into a vector ~x, and the model is trained to denoise, i.e., to minimize a denoising reconstruction error loss (x; r (~x)). Hence the DAE cannot simply copy its input ~x in its code layer h (~x), even if the dimension of h (~x) is greater than that of ~x. The denoising error can be linked in several ways to the likelihood of a generative model of the distribution of the uncorrupted examples x (Vincent, 2011).
- QUOTE: The basic framework for our models is the Stacked Denoising Auto-encoder (Vincent et al., 2008). An auto-encoder is comprised of an encoder function [math]\displaystyle{ h(\cdot) }[/math] and a decoder function [math]\displaystyle{ g(\cdot) }[/math], typically with the dimension of [math]\displaystyle{ h(\cdot) }[/math] smaller than that of its argument. The reconstruction of input x is given by r (x) = g (h (x)), and auto-encoders are typically trained to minimize a form of reconstruction error loss (x; r (x)). Examples of reconstruction error include the squared error, or like here, when the elements of x or r (x) can be considered as probabilities of a discrete event, the Kullback-Domain Adaptation for Sentiment Classification with Deep Learning Liebler divergence between elements of x and elements of r (x). When the encoder and decoder are linear and the reconstruction error is quadratic, one recovers in h (x) the space of the principal components (PCA) of x. Once an auto-encoder has been trained, one can stack another auto-encoder on top of it, by training a second one which sees the encoded output of the first one as its training data. Stacked auto-encoders were one of the first methods for building deep architectures (Bengio et al., 2006), along with Restricted Boltzmann Machines (RBMs) (Hinton et al., 2006). Once a stack of auto-encoders or RBMs has been trained, their parameters describe multiple levels of representation for x and can be used to initialize a supervised deep neural network (Bengio, 2009) or directly feed a classifier, as we do in this paper.
2008
- (Vincent et al., 2008) ⇒ Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. (2008). "Extracting and Composing Robust Features with Denoising Autoencoders." In: Proceedings of the 25th International Conference on Machine learning (ICML 2008).