Unsupervised Pre-Training Algorithm

References

(Dahl et al., 2012) ⇒ George E. Dahl, Dong Yu, Li Deng, and Alex Acero. (2012). “Context-Dependent Pre-trained Deep Neural Networks for Large-Vocabulary Speech Recognition.” In: IEEE Transactions on Audio, Speech, and Language Processing, 20(1). doi:10.1109/TASL.2011.2134090
- QUOTE: Recently, a major advance has been made in training densely connected, directed belief nets with many hidden layers. The resulting deep belief nets learn a hierarchy of nonlinear feature detectors that can capture complex statistical patterns in data. The deep belief net training algorithm suggested in [24] first initializes the weights of each layer individually in a purely unsupervised^[1] way and then fine-tunes the entire network using labeled data. This semi-supervised approach using deep models has proved effective in a number of applications, including coding and classification for speech, audio, text, and image data ([25]–[29]). T

↑ In the context of ASR, we use the term “unsupervised” to mean acoustic data with no transcriptions of any kind