WaveNet Neural Network
A WaveNet Neural Network is a deep neural network for raw audio waveforms.
- …
- Counter-Example(s):
- See: Dilated ConvNet, Google Cloud Text-to-Speech Service.
References
2016b
- https://deepmind.com/blog/wavenet-generative-model-raw-audio/
- QUOTE: This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%. ...
... Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet. ...
- QUOTE: This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%. ...
2016
- (Oord et al., 2016) ⇒ Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alexander Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. (2016). “WaveNet: A Generative Model for Raw Audio.” Arxiv:1609.03499
- QUOTE: ... Figure 3: Visualization of a stack of dilated causal convolutional layers.
- QUOTE: ... Figure 3: Visualization of a stack of dilated causal convolutional layers.