Stacked Convolutional Neural Network

A Stacked Convolutional Neural Network is a Stacked Neural Network that is a combination of convolutional neural networks.

Example(s):
- Context-Aware Stacked Convolutional Neural Network (CAS-CNN) (Bejnordi et al., 2017).
- …
Counter-Example(s):
- Stacked Bidirectional and Unidirectional LSTM (SBU-LSTM) Neural Network,
- Stacked Autoencoding Neural Network.
See: Encoder-Decoder Neural Network, Deep Neural Network, Neural Network Training System, Natural Language Processing System, Convolutional Neural Network, Stacked Ensemble-based Learning Task, Attention Mechanism.

References

2017

(Bejnordi et al., 2017) ⇒ Babak Ehteshami Bejnordi, Guido Zuidhof, Maschenka Balkenhol, Meyke Hermsen, Peter Bult, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, and Jeroen van der Laak (2017). "Context-Aware Stacked Convolutional Neural Networks for Classification of Breast Carcinomas in Whole-slide Histopathology Images". In: Journal of Medical Imaging, 4(4), 044504.
- QUOTE: In order to increase the context available for dense prediction, we stack a second CNN on top of the last convolutional layer of the previously trained WRN-4-2 network. The architecture of the stacked network, as shown in figure 2.2, is a hybrid between the wide ResNet architecture and the VGG architecture^[1]. CAS-CNN is fully convolutional and enables fast dense prediction due to re-using of overlapping convolutions during inference. All the parameters of the WRN4-2 network were fixed during training. Despite being trained with fixed input patches of size 224 × 224, because of being a fully convolutional network, WRN-4-2 can take a larger patch size during training of the stacked network, and consequently produce feature maps with larger spatial dimensions. Moreover, because of fixing the parameters of WRN-4-2, the intermediate feature maps of this network do not need to be stored during backpropagation of the gradient. This allowed us to train stacked networks with much larger effective patch sizes. Consequently, we trained 3 networks with patch sizes of 512 × 512, 768 × 768, and 1024 × 1024. Producing the dense prediction for a given WSI involved sliding the stacked network over the WSI with a stride of 224.

**Figure 2:** Architectures used for patch classification. (a) The WRN-4-2 architecture used for classification of 224 × 224 input patches. This architecture consists of an initial convolutional layer that is followed by three residual convolution groups (each of size N=4 residual blocks), followed by global average pooling and a softmax classifier. Downsampling is performed by the first convolutional layers in each group with a stride of 2 and the first convolutional layer of the entire network. Here, Conv 3@32 is a convolutional layer with a kernel size of 3 × 3, and 32 filters. (b) The Residual Block (RB) used in this paper. Batch normalization and ReLU precede each convolution. ⊕ indicates an element-wise sum. Note that the 1 × 1 convolution layer is only used in the first convolutional layer of each Residual convolution group. (c) Architecture of the CAS-CNN with input size of 768 × 768. The weights of the components with dotted outlines are taken from the previously trained WRN-4-2 network, and are no longer updated during training.

↑ K. Simonyan and A. Zisserman (2014) "Very Deep Convolutional Networks for Large-Scale Image Recognition". Preprint arXiv:1409.1556

[ftn-1-1] K. Simonyan and A. Zisserman (2014) "Very Deep Convolutional Networks for Large-Scale Image Recognition". Preprint arXiv:1409.1556

[1]

Stacked Convolutional Neural Network

References

2017

Navigation menu

Search