Convolutional Neural Network (CNN) Training System

A Convolutional Neural Network (CNN) Training System is a feed-forward NNet training system that implements an CNN algorithm (to solve an CNN training task based on a CNN architecture).

Context:
- …
Example(s):
- a VGG CNN,
- an AlexNet,
- a GoogLeNet,
- a LeNet-5,
- a MatConvNet,
- a ResNet,
- a SqueezeNet,
- a DenseNet.
- an InceptionV3.
Counter-Example(s):
- a HMAX,
- a LeNet,
- a NeoCognitron,
- a Non-Convolutional Feed-Forward Network,
- a Convolutional Restricted Boltzmann Machine (CRBM) Training System,
- a Recurrent Neural Network (RNN) Training System.
- …
See: Convolution Function, Neural Network Layer, Neural Network Unit, Neural Network Convolutional Layer, Neural Network Pooling Layer, Neural Network Activation Function, Neural Network Weight, Artificial Neural Network, Supervised Machine Learning System, Machine Learning Classification System, Unsupervised Machine Learning System, Reiforcement Learning System, Deep Learning System.

References

2018

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Convolutional_neural_network Retrieved:2018-2-25.
- In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
  CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.^[1] They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. Zhang, Wei (1988). "Shift-invariant pattern recognition neural network and its optical architecture". Proceedings of annual conference of the Japan Society of Applied Physics.</ref> Zhang, Wei (1990). "Parallel distributed processing model with local space-invariant interconnections and its optical architecture". Applied Optics. 29 (32): 4790–7. Bibcode:1990ApOpt..29.4790Z. doi:10.1364/AO.29.004790. PMID 20577468.</ref>
  Convolutional networks were inspired by biological processes^[2] in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
  CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.
  They have applications in image and video recognition, recommender systems and natural language processing.

2017a

(Gibson & Patterson, 2017) ⇒ Adam Gibson, Josh Patterson (2017). "Chapter 4. Major Architectures of Deep Networks". In: "Deep Learning" ISBN: 9781491924570.
- QUOTE: CNNs transform the input data from the input layer through all connected layers into a set of class scores given by the output layer. There are many variations of the CNN architecture, but they are based on the pattern of layers, as demonstrated in Figure 4-9.
  Figure 4-9 depicts three major groups:

The input layer accepts three-dimensional input generally in the form spatially of the size (width × height) of the image and has a depth representing the color channels (generally three for RGB color channels).

The feature-extraction layers have a general repeating pattern of the sequence:

Convolution layer
We express the Rectified Linear Unit (ReLU) activation function as a layer in the diagram here to match up to other literature.
Pooling layer

These layers find a number of features in the images and progressively construct higher-order features. This corresponds directly to the ongoing theme in deep learning by which features are automatically learned as opposed to traditionally hand engineered.

Finally we have the classification layers in which we have one or more fully connected layers to take the higher-order features and produce class probabilities or scores. These layers are fully connected to all of the neurons in the previous layer, as their name implies. The output of these layers produces typically a two-dimensional output of the dimensions [b × N], where b is the number of examples in the mini-batch and N is the number of classes we’re interested in scoring.

Figure 4-9. High-level general CNN architecture

2013a

(VistaLab, 2013) ⇒ (2013) An Introduction to Convolutional Neural Networks. In: VISTA LabTeaching WIKI
- QUOTE: The solution to FFNNs' problems with image processing took inspiration from neurobiology, Yann LeCun and Toshua Bengio tried to capture the organization of neurons in the visual cortex of the cat, which at that time was known to consist of maps of local receptive fields that decreased in granularity as the cortex moved anteriorly. There are several different theory about how to precisely define such a model, but all of the various implementations can be loosely described as involving the following process:
  - Convolve several small filters on the input image
  - Subsample this space of filter activations
  - Repeat steps 1 and 2 until your left with sufficiently high level features.
  - Use a standard a standard FFNN to solve a particular task, using the results features as input.

The LeCun Formulation

There are several different ways one might formalize the high level process described above, but the most common is LeCun's implementation, the LeNet (...)

The complete implementation of the LeNet.

2013b

(DeepLearning Tutorial, 2013) ⇒ Theano Development Team (2008–2013). Convolutional Neural Networks (LeNet) In: DeepLearning 0.1 documentation
- QUOTE: Convolutional Neural Networks (CNN) are variants of MLPs which are inspired from biology. From Hubel and Wiesel’s early work on the cat’s visual cortex [Hubel68], we know there exists a complex arrangement of cells within the visual cortex. These cells are sensitive to small sub-regions of the input space, called a receptive field, and are tiled in such a way as to cover the entire visual field. These filters are local in input space and are thus better suited to exploit the strong spatially local correlation present in natural images.
  Additionally, two basic cell types have been identified: simple cells (S) and complex cells (C). Simple cells (S) respond maximally to specific edge-like stimulus patterns within their receptive field. Complex cells (C) have larger receptive fields and are locally invariant to the exact position of the stimulus.
  The visual cortex being the most powerful “vision” system in existence, it seems natural to emulate its behavior. Many such neurally inspired models can be found in the literature. To name a few: the NeoCognitron Fukushima, HMAX Serre07 and LeNet-5 LeCun98, which will be the focus of this tutorial.

↑ LeCun, Yann. "LeNet-5, convolutional neural networks". Retrieved 16 November 2013.
↑ Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). "Subject independent facial expression recognition with robust face detection using a convolutional neural network" (PDF). Neural Networks. 16 (5): 555–559. doi:10.1016/S0893-6080(03)00115-1. Retrieved 17 November 2013.

[LeCun-1] LeCun, Yann. "LeNet-5, convolutional neural networks". Retrieved 16 November 2013.

[robust_face_detection-2] Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). "Subject independent facial expression recognition with robust face detection using a convolutional neural network" (PDF). Neural Networks. 16 (5): 555–559. doi:10.1016/S0893-6080(03)00115-1. Retrieved 17 November 2013.

[1]

[2]