Latent Sequence Decompositions (LSD) System
Jump to navigation
Jump to search
A Latent Sequence Decompositions (LSD) System is an End-To-End Automatic Speech Recognition System that can learn sub-word units as segments of sequences that are a function of both the input and output sequence.
- Context:
- It was first proposed by Chan et al. (2017).
- It an attention-based ASR System.
- …
- Example(s):
- the LSD-ARD system proposed in Chan et al. (2017),
- …
- Counter-Example(s):
- See: Sequence-to-Sequence Model, Approximate Decoding Algorithm, Wall Street Journal Speech Recognition Task, Latent Dirichlet Allocation, Singular Value Decomposition, Tensor Rank Decomposition.
References
2022
- (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Speech_recognition#End-to-end_automatic_speech_recognition Retrieved:2022-4-3.
- QUOTE: (...) An alternative approach to CTC-based models are attention-based models. Attention-based ASR models were introduced simultaneously by Chan et al. of Carnegie Mellon University and Google Brain and Bahdanau et al. of the University of Montreal in 2016. The model named "Listen, Attend and Spell" (LAS), literally "listens" to the acoustic signal, pays "attention" to different parts of the signal and "spells" out the transcript one character at a time. Unlike CTC-based models, attention-based models do not have conditional-independence assumptions and can learn all the components of a speech recognizer including the pronunciation, acoustic and language model directly. This means, during deployment, there is no need to carry around a language model making it very practical for applications with limited memory. By the end of 2016, the attention-based models have seen considerable success including outperforming the CTC models (with or without an external language model). Various extensions have been proposed since the original LAS model. Latent Sequence Decompositions (LSD) was proposed by Carnegie Mellon University, MIT and Google Brain to directly emit sub-word units which are more natural than English characters; University of Oxford and Google DeepMind extended LAS to "Watch, Listen, Attend and Spell" (WLAS) to handle lip reading surpassing human-level performance.
2017
- (Chan et al., 2017) ⇒ William Chan, Yu Zhang, Quoc V. Le, and Navdeep Jaitly. (2017). “Latent Sequence Decompositions.” In: Conference Track Proceedings of the 5th International Conference on Learning Representations (ICLR 2017).
- QUOTE: We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence.