Spec-ResNet

From GM-RKB

Jump to navigation Jump to search

A Spec-ResNet is a Deep Residual Neural Network that takes log-magnitude STFT as input features.

Context:
- It was initially developed by Alzantot et al. (2019).
Example(s):
- Alzantot et al. (2019) Spec-ResNet architecture:
  .
Counter-Example(s):
See: Residual Neural Network, Residual Neural Network, Convolutional Neural Network, Machine Learning, Deep Learning, Machine Vision.

References

2019

(Alzantot et al., 2019) ⇒ Moustafa Alzantot, Ziqi Wang, and Mani B. Srivastava. (2019). “Deep Residual Neural Networks for Audio Spoofing Detection.” In: Proceedings of 20th Annual Conference of the International Speech Communication Association (Interspeech 2019).
- QUOTE: Figure 1 shows the architecture of the Spec-ResNet model which takes the log-magnitude STFT as input features. First, the input is treated as a single channel image and passed through a 2D convolution layer with 32 filters, where filter size = 3 × 3, stride length = 1 and padding = 1. The output volume of the first convolution layer has 32 channels and is passed through a sequence of 6 residual blocks. The output from the last residual block is fed into a dropout layer (with dropout rate = 50%; Srivastava et al., 2014) followed by a hidden fully connected (FC) layer with leaky-ReLU (He et al., 2015) activation function ($\alpha = 0.01$). Outputs from the hidden FC layer are fed into another FC layer with two units that produce classification logits. The logits are finally converted into a probability distribution using a final softmax layer.

**Figure 1:** Model architecture for the Spec-ResNet model. Detailed structure of residual blocks is shown in 2.

**Figure 2:** Detailed architecture of the convolution block with residual connection.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Spec-ResNet&oldid=865270"