DenseNet
A DenseNet is a Deep Convolutional Neural Network that is based on the implementation of Dense Blocks.
- AKA: Dense Convolutional Network.
- Context:
- It was initially developed by Huang et al. (2017).
- …
- Example(s):
- a DenseNet models implementation in the PyTorch Framework:
- Densenet-121 model
torchvision.models.densenet121(pretrained=False, **kwargs)
[1] - Densenet-161 model
torchvision.models.densenet161(pretrained=False, **kwargs)
[2] - Densenet-169 model
torchvision.models.densenet169(pretrained=False, **kwargs)
[3] - Densenet-201 model
torchvision.models.densenet201(pretrained=False, **kwargs)
[4]
- Densenet-121 model
- a CondenseNet.
- …
- a DenseNet models implementation in the PyTorch Framework:
- Counter-Example(s):
- an AlexNet,
- a GoogLeNet,
- an InceptionV3,
- a LeNet-5,
- a MatConvNet,
- ResNet,
- a SqueezeNet,
- a VGG CNN,
- a ZF Net.
- See: Convolutional Neural Network, Deep Learning, Deep Neural Network, Machine Learning, Machine Vision, Shared Memory Allocation.
References
2017a
- (Huang et al., 2017) ⇒ Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger (2017, July). "Densely Connected Convolutional Networks". In CVPR 2017 pp. 2261-2269.
- QUOTE: To further improve the information flow between layers we propose a different connectivity pattern: we introduce direct connections from any layer to all subsequent layers. Figure 1 illustrates the layout of the resulting DenseNet schematically. Consequently, the [math]\displaystyle{ \ell- }[/math]th layer receives the feature-maps of all preceding layers, [math]\displaystyle{ x_0, \cdots, x_{\ell-1} }[/math] as input:[math]\displaystyle{ x_\ell = H_\ell([x_0, x_1, \cdots , x_{\ell−1}]), \quad (2) }[/math]
where [math]\displaystyle{ [x_0, x_1, \cdots , x_{\ell−1}] }[/math] refers to the concatenation of the feature-maps produced in layers [math]\displaystyle{ 0, \cdots, \ell−1 }[/math]. Because of its dense connectivity we refer to this network architecture as Dense Convolutional Network (DenseNet).
- QUOTE: To further improve the information flow between layers we propose a different connectivity pattern: we introduce direct connections from any layer to all subsequent layers. Figure 1 illustrates the layout of the resulting DenseNet schematically. Consequently, the [math]\displaystyle{ \ell- }[/math]th layer receives the feature-maps of all preceding layers, [math]\displaystyle{ x_0, \cdots, x_{\ell-1} }[/math] as input:
2017b
- (Pleiss et al., 2017) ⇒ Geoff Pleiss, Danlu Chen, Gao Huang, Tongcheng Li, Laurens van der Maaten, and Kilian Q. Weinberger (2017). "Memory-efficient implementation of densenets". arXiv preprint arXiv:1707.06990.
- QUOTE: In this report, we introduce a strategy to substantially reduce the training-time memory cost of DenseNet implementations, with a minor reduction in speed. Our primary observation is that the intermediate feature maps responsible for most of the memory consumption are relatively cheap to compute. This allows us to introduce Shared Memory Allocations, which are used by all layers to store intermediate results. Subsequent layers overwrite the intermediate results of previous layers, but their values can be re-populated during the backward pass at minimal cost. Doing so reduces feature map memory consumption from quadratic to linear, while only adding 15 − 20% additional training time. This memory savings makes it possible to train extremely large DenseNets on a reasonable GPU budget. In particular, we are able to extend the largest DenseNet from 161 layers (k = 48 features per layer, 20M parameters) to 264 layers (k = 48, 73M parameters). On ImageNet, this model achieves a single-crop top-1 error of 20.26%, which (to the best of our knowledge) is state-of-the-art.
Figure 3: DenseNet layer forward pass: original implementation (left) and efficient implementation (right). Solid boxes correspond to tensors allocated in memory, where as translucent boxes are pointers. Solid arrows represent computation, and dotted arrows represent memory pointers. The efficient implementation stores the output of the concatenation, batch normalization, and ReLU layers in temporary storage buffers, whereas the original implementation allocates new memory.
- QUOTE: In this report, we introduce a strategy to substantially reduce the training-time memory cost of DenseNet implementations, with a minor reduction in speed. Our primary observation is that the intermediate feature maps responsible for most of the memory consumption are relatively cheap to compute. This allows us to introduce Shared Memory Allocations, which are used by all layers to store intermediate results. Subsequent layers overwrite the intermediate results of previous layers, but their values can be re-populated during the backward pass at minimal cost. Doing so reduces feature map memory consumption from quadratic to linear, while only adding 15 − 20% additional training time. This memory savings makes it possible to train extremely large DenseNets on a reasonable GPU budget. In particular, we are able to extend the largest DenseNet from 161 layers (k = 48 features per layer, 20M parameters) to 264 layers (k = 48, 73M parameters). On ImageNet, this model achieves a single-crop top-1 error of 20.26%, which (to the best of our knowledge) is state-of-the-art.