Spec-ResNet
Jump to navigation
Jump to search
A Spec-ResNet is a Deep Residual Neural Network that takes log-magnitude STFT as input features.
- Context:
- It was initially developed by Alzantot et al. (2019).
- Example(s):
- Counter-Example(s):
- See: Residual Neural Network, Residual Neural Network, Convolutional Neural Network, Machine Learning, Deep Learning, Machine Vision.
References
2019
- (Alzantot et al., 2019) ⇒ Moustafa Alzantot, Ziqi Wang, and Mani B. Srivastava. (2019). “Deep Residual Neural Networks for Audio Spoofing Detection.” In: Proceedings of 20th Annual Conference of the International Speech Communication Association (Interspeech 2019).
- QUOTE: Figure 1 shows the architecture of the Spec-ResNet model which takes the log-magnitude STFT as input features. First, the input is treated as a single channel image and passed through a 2D convolution layer with 32 filters, where filter size = 3 × 3, stride length = 1 and padding = 1. The output volume of the first convolution layer has 32 channels and is passed through a sequence of 6 residual blocks. The output from the last residual block is fed into a dropout layer (with dropout rate = 50%; Srivastava et al., 2014) followed by a hidden fully connected (FC) layer with leaky-ReLU (He et al., 2015) activation function ($\alpha = 0.01$). Outputs from the hidden FC layer are fed into another FC layer with two units that produce classification logits. The logits are finally converted into a probability distribution using a final softmax layer.
- QUOTE: Figure 1 shows the architecture of the Spec-ResNet model which takes the log-magnitude STFT as input features. First, the input is treated as a single channel image and passed through a 2D convolution layer with 32 filters, where filter size = 3 × 3, stride length = 1 and padding = 1. The output volume of the first convolution layer has 32 channels and is passed through a sequence of 6 residual blocks. The output from the last residual block is fed into a dropout layer (with dropout rate = 50%; Srivastava et al., 2014) followed by a hidden fully connected (FC) layer with leaky-ReLU (He et al., 2015) activation function ($\alpha = 0.01$). Outputs from the hidden FC layer are fed into another FC layer with two units that produce classification logits. The logits are finally converted into a probability distribution using a final softmax layer.