AlexNet
An AlexNet is Deep Convolutional Neural Network that was developed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton for the ImageNet Large Scale Visual Recognition Challenge.
- Countext:
- It ranges from being a 8-layer to being a 15-layer deep convolutional neural network.
- It usually consists of the following Neural Network Layers:
- an input layer.
- 5 convolutional layers with a ReLU activation function;
- 2 normalization layers; (optional)
- 3 pooling layers;
- 2 dropout layers; (optional)
- 3 fully-connected layers;
- a fully-connected ouput layer with a softmax activation function.
- …
- Example(s):
- Counter-Example(s):
- a DenseNet,
- a GoogLeNet,
- an InceptionV3,
- a LeNet-5,
- a MatConvNet,
- a ResNet.
- a SqueezeNet,
- a VGG CNN,
- a ZF Net.
- See: Convolutional Neural Network, Machine Learning, Deep Learning, Machine Vision.
References
2018a
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/AlexNet Retrieved:2018-9-30.
- AlexNet is the name of a convolutional neural network, invented by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton (Gershgorn, 2017). [1]. AlexNet has had a large impact on the field of machine learning, specifically in the application of deep learning to machine vision. As of 2018 it has been cited over 25,000 times.
AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge in 2012 (Krizhevsky, Sutskever & Hinton, 2012). The network achieved a top-5 error of 15.3%, more than 10.8 percentage points lower than that of the runner up. The original paper's primary result was that the depth of the model was essential for its high performance, which was computationally expensive, but made feasible due to the utilization of GPUs during training (Krizhevsky, Sutskever & Hinton, 2012).
- AlexNet is the name of a convolutional neural network, invented by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton (Gershgorn, 2017). [1]. AlexNet has had a large impact on the field of machine learning, specifically in the application of deep learning to machine vision. As of 2018 it has been cited over 25,000 times.
2018b
- (Bhattacharyya et al., 2018) ⇒ Amartya Bhattacharyya, Christine H. Lind, and Rahul Shirpurkar (2018). "Threat Detection in TSA Scans using AlexNet".
- QUOTE: AlexNet is a convolutional neural network that consists of 15 layers total. These include five convolutional layers, each with a rectified linear (ReLu) activation function, two normalization layers, three pooling layers, two dropout layers and three fully connected layers. The first two fully connected layers use a ReLu activation function, while the last fully connected layer uses a softmax activation function. The detailed architecture of AlexNet is displayed in Figure 5.
Fig. 5: Detailed architecture of AlexNet as implemented in Python (top left), AlexNet implemented in MATLAB (top right) and the modified MATLAB layers (bottom). The dropout layers (not shown in the top left) are inserted after the first and second fully connected layers. Filter size is shown as N×N, the number of filters as Nfm and the stride as N×Nsub for a number N.
- QUOTE: AlexNet is a convolutional neural network that consists of 15 layers total. These include five convolutional layers, each with a rectified linear (ReLu) activation function, two normalization layers, three pooling layers, two dropout layers and three fully connected layers. The first two fully connected layers use a ReLu activation function, while the last fully connected layer uses a softmax activation function. The detailed architecture of AlexNet is displayed in Figure 5.
2018c
- (CS231N, 2018) ⇒ https://cs231n.github.io/convolutional-networks/#case Retrieved 2018-09-30
- QUOTE: There are several architectures in the field of Convolutional Networks that have a name. The most common are:
- LeNet. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990’s. Of these, the best known is the LeNet architecture that was used to read zip codes, digits, etc.
- AlexNet. The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).
2017a
- (Gershgorn, 2017) ⇒ Dave Gershgorn (2017). "The data that transformed AI research—and possibly the world"
- QUOTE: Li then approached a well-known image recognition competition in Europe called PASCAL VOC, which agreed to collaborate and co-brand their competition with ImageNet. The PASCAL challenge was a well-respected competition and dataset, but representative of the previous method of thinking. The competition only had 20 classes, compared to ImageNet’s 1,000.
As the competition continued in 2011 and into 2012, it soon became a benchmark for how well image classification algorithms fared against the most complex visual dataset assembled at the time …
- QUOTE: Li then approached a well-known image recognition competition in Europe called PASCAL VOC, which agreed to collaborate and co-brand their competition with ImageNet. The PASCAL challenge was a well-respected competition and dataset, but representative of the previous method of thinking. The competition only had 20 classes, compared to ImageNet’s 1,000.
2017b
- (Li, Johnson & Yeung, 2017) ⇒ Fei-Fei Li, Justin Johnson, and Serena Yeung (2017). Lecture 9: CNN Architectures
2014
- (Krizhevsky, 2014) ⇒ Alex Krizhevsky (2014). "One weird trick for parallelizing convolutional neural networks". arXiv preprint arXiv:1404.5997.
2012
- (Krizhevsky et al., 2012) ⇒ Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. (2012). "Imagenet classification with deep convolutional neural networks". In Advances in Neural Information Processing Systems (pp. 1097-1105).
- QUOTE: As depicted in Figure 2, the net contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution.
The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (see Figure 2). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully-connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000.
- QUOTE: As depicted in Figure 2, the net contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution.