Skip-Connection
Jump to navigation
Jump to search
A Skip-Connection is a neural network connection that ...
- See: Residual Block.
References
2020
- https://theaisummer.com/skip-connections/
- QUOTE: ... At present, skip connection is a standard module in many convolutional architectures. By using a skip connection, we provide an alternative path for the gradient (with backpropagation). It is experimentally validated that these additional paths are often beneficial for the model convergence. Skip connections in deep architectures, as the name suggests, skip some layer in the neural network and feeds the output of one layer as the input to the next layers (instead of only the next one). As previously explained, using the chain rule, we must keep multiplying terms with the error gradient as we go backwards. However, in the long chain of multiplication, if we multiply many things together that are less than one, then the resulting gradient will be very small. Thus, the gradient becomes very small as we approach the earlier layers in a deep architecture. In some cases, the gradient becomes zero, meaning that we do not update the early layers at all.
In general, there are two fundamental ways that one could use skip connections through different non-sequential layers:
- a) addition as in residual architectures,
- b) concatenation as in densely connected architectures.
- QUOTE: ... At present, skip connection is a standard module in many convolutional architectures. By using a skip connection, we provide an alternative path for the gradient (with backpropagation). It is experimentally validated that these additional paths are often beneficial for the model convergence. Skip connections in deep architectures, as the name suggests, skip some layer in the neural network and feeds the output of one layer as the input to the next layers (instead of only the next one). As previously explained, using the chain rule, we must keep multiplying terms with the error gradient as we go backwards. However, in the long chain of multiplication, if we multiply many things together that are less than one, then the resulting gradient will be very small. Thus, the gradient becomes very small as we approach the earlier layers in a deep architecture. In some cases, the gradient becomes zero, meaning that we do not update the early layers at all.
2017
- (Tong et al., 2017) ⇒ Tong Tong, Gen Li, Xiejie Liu, and Qinquan Gao. (2017). “Image Super-resolution Using Dense Skip Connections.” In: Proceedings of the IEEE International Conference on computer vision, pp. 4799-4807.
- QUOTE: … A skip connection was used in [12] to link the input data and the final reconstruction layer in SR. State-of-the-art SR results were achieved in [12]. However, only a single skip connection was adopted in [12], which may not fully explore the advantages of skip connections …