GoogLeNet
A GoogLeNet is a Deep Convolutional Neural Network that includes the stacking of Inception Modules, developed by Szegedy et al., (2014) for the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
- AKA: Inception-v1.
- Countext:
- It ranges from being a 22-layer to being a 27-layer deep convolutional neural network.
- It usually consists of the following Neural Network Layers:
- …
- Example(s):
- Counter-Example(s):
- an AlexNet.
- a DenseNet,
- an Inception-v3,
- a LeNet-5,
- a MatConvNet,
- a ResNet.
- a SqueezeNet,
- a VGG CNN,
- a ZF Net.
- See: Convolutional Neural Network, Machine Learning, Deep Learning, Machine Vision, Network In Network (NIN), Inception Convolutional Neural Network.
References
2018a
- (CS231N, 2018) ⇒ https://cs231n.github.io/convolutional-networks/#case Retrieved 2018-09-30
- QUOTE: There are several architectures in the field of Convolutional Networks that have a name. The most common are:
- (...)
- GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently Inception-v4.
- QUOTE: There are several architectures in the field of Convolutional Networks that have a name. The most common are:
2018b
- (Chainer, 2018) ⇒ http://docs.chainer.org/en/stable/reference/generated/chainer.links.GoogLeNet.html Retrieved 2018-09-30
- QUOTE: GoogLeNet, which is also called Inception-v1, is an architecture of convolutional neural network proposed in 2014. This model is relatively lightweight and requires small memory footprint during training compared with modern architectures such as ResNet. Therefore, if you fine-tune your network based on a model pre-trained by Imagenet and need to train it with large batch size, GoogLeNet may be useful. On the other hand, if you just want an off-the-shelf classifier, we recommend you to use ResNet50 or other models since they are more accurate than GoogLeNet.
The original model is provided here: https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet
- QUOTE: GoogLeNet, which is also called Inception-v1, is an architecture of convolutional neural network proposed in 2014. This model is relatively lightweight and requires small memory footprint during training compared with modern architectures such as ResNet. Therefore, if you fine-tune your network based on a model pre-trained by Imagenet and need to train it with large batch size, GoogLeNet may be useful. On the other hand, if you just want an off-the-shelf classifier, we recommend you to use ResNet50 or other models since they are more accurate than GoogLeNet.
2017
- (Li, Johnson & Yeung, 2017) ⇒ Fei-Fei Li, Justin Johnson, and Serena Yeung (2017). Lecture 9: CNN Architectures
- QUOTE: Case Study: GoogLeNet
2015
- (Szegedy et al., 2015) ⇒ Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet , Scott Reed , Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich (2015). "Going deeper with convolutions". In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). arXiv preprint arXiv: 14094842
- QUOTE: By the “GoogLeNet” name we refer to the particular incarnation of the Inception architecture used in our submission for the ILSVRC 2014 competition. We also used one deeper and wider Inception network with slightly superior quality, but adding it to the ensemble seemed to improve the results only marginally. We omit the details of that network, as empirical evidence suggests that the influence of the exact architectural parameters is relatively minor (...)
The network is 22 layers deep when counting only layers with parameters (or 27 layers if we also count pooling). The overall number of layers (independent building blocks) used for the construction of the network is about 100. The exact number depends on how layers are counted by the machine learning infrastructure. The use of average pooling before the classifier is based on Lin, Chen & Yan, (2013), although our implementation has an additional linear layer. The linear layer enables us to easily adapt our networks to other label sets, however it is used mostly for convenience and we do not expect it to have a major effect. We found that a move from fully connected layers to average pooling improved the top-1 accuracy by about 0.6%, however the use of dropout remained essential even after removing the fully connected layers.
Figure 3: GoogLeNet network with all the bells and whistles.
- QUOTE: By the “GoogLeNet” name we refer to the particular incarnation of the Inception architecture used in our submission for the ILSVRC 2014 competition. We also used one deeper and wider Inception network with slightly superior quality, but adding it to the ensemble seemed to improve the results only marginally. We omit the details of that network, as empirical evidence suggests that the influence of the exact architectural parameters is relatively minor (...)
2013
- (Lin, Chen & Yan, 2013) &Arr; Min Lin, Qiang Chen, and Shuicheng Yan (2013). "Network in network". arXiv preprint arXiv:1312.4400.
- QUOTE: The resulting structure which we call an mlpconv layer is compared with CNN in Figure 1. Both the linear convolutional layer and the mlpconv layer map the local receptive field to an output feature vector (...)
Figure 1: Comparison of linear convolution layer and mlpconv layer. The linear convolution layer includes a linear filter while the mlpconv layer includes a micro network (we choose the multilayer perceptron in this paper). Both layers map the local receptive field to a confidence value of the latent concept.
Figure 2: The overall structure of Network In Network. In this paper the NINs include the stacking of three mlpconv layers and one global average pooling layer.
- QUOTE: The resulting structure which we call an mlpconv layer is compared with CNN in Figure 1. Both the linear convolutional layer and the mlpconv layer map the local receptive field to an output feature vector (...)