DAG Recurrent Neural Network (DAG-RNN)
A DAG Recurrent Neural Network (DAG-RNN) is a bidirectional RNN that is based on a DAG model and in which the input sequences are known.
- Example(s)
- Counter-Example(s):
- See: Recurrent Neural Network, Recursive Neural Network, Support Vector Machine, Directed Acyclic Graph.
References
2018
- (Shuai et al., 2018) ⇒ Bing Shuai, Zhen Zuo, Bing Wang, and Gang Wang (2018). "Scene segmentation with dag-recurrent neural networks". IEEE transactions on pattern analysis and machine intelligence, 40(6), 1480-1493.
- ABSTRACT: In this paper, we address the challenging task of scene segmentation. In order to capture the rich contextual dependencies over image regions, we propose Directed Acyclic Graph-Recurrent Neural Networks (DAG-RNN) to perform context aggregation over locally connected feature maps. More specifically, DAG-RNN is placed on top of pre-trained CNN (feature extractor) to embed context into local features so that their representative capability can be enhanced. In comparison with plain CNN (as in Fully Convolutional Networks-FCN), DAG-RNN is empirically found to be significantly more effective at aggregating context. Therefore, DAG-RNN demonstrates noticeably performance superiority over FCNs on scene segmentation. Besides, DAG-RNN entails dramatically less parameters as well as demands fewer computation operations, which makes DAG-RNN more favorable to be potentially applied on resource-constrained embedded devices. Meanwhile, the class occurrence frequencies are extremely imbalanced in scene segmentation, so we propose a novel class-weighted loss to train the segmentation network. The loss distributes reasonably higher attention weights to infrequent classes during network training, which is essential to boost their parsing performance. We evaluate our segmentation network on three challenging public scene segmentation benchmarks: Sift Flow, Pascal Context and COCO Stuff. On top of them, we achieve very impressive segmentation performance.
2017
- (Schmidhuber, 2017) ⇒ Jurgen Schmidhuber (2017) "Deep Learning". In: Sammut, C., Webb, G.I. (eds) "Encyclopedia of Machine Learning and Data Mining". Springer, Boston, MA
- QUOTE: Recursive NNs (Goller and Küchler 1996) generalize RNNs, by operating on hierarchical structures, recursively combining child representations into parent representations. Bidirectional RNNs (BRNNs) (Schuster and Paliwal 1997) are designed for input sequences whose starts and ends are known in advance, such as spoken sentences to be labeled by their phonemes. DAG-RNNs (Baldi and Pollastri 2003) generalize BRNNs to multiple dimensions. Recursive NNs, BRNNs, and DAG-RNNs unfold their full potential when combined with LSTM (Graves et al. 2009).
2016
- (Shuai et al., 2016) ⇒ Bing Shuai, Zhen Zuo, Gang Wang, and Bing Wang (2016). "Dag-recurrent neural networks for scene labeling". In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3620-3629).
- QUOTE: The skeleton architecture of the full labeling network is illustrated in Figure 3. The network is end-to-end trainable, and it takes input as raw RGB images with any size. It outputs the label prediction maps with the same size of inputs.
The convolution layer is used to produce compact yet highly discriminative features for local regions. Next, the proposed DAG-RNN is used to model the semantic contextual dependencies of local representations. Finally, the deconvolution layer [1] is introduced to upsample the feature maps by learning a set of deconvolution filters, and it enables the full labeling network to produce the desired size of label prediction maps.
Figure 3: The architecture of the full labeling network, which consists of three functional layers: (1), convolution layer: it produces discriminative feature maps; (2), DAG-RNN: it models the contextual dependency among elements in the feature maps; (3), deconvolution layer: it upsamples the feature maps to output the desired sizes of label prediction maps.
- QUOTE: The skeleton architecture of the full labeling network is illustrated in Figure 3. The network is end-to-end trainable, and it takes input as raw RGB images with any size. It outputs the label prediction maps with the same size of inputs.
2003
- (Baldi & Pollastri, 2003) ⇒ Pierre Baldi, and Gianluca Pollastri (2003). "The principled design of large-scale recursive neural network architectures--dag-rnns and the protein structure prediction problem" (PDF). Journal of Machine Learning Research, 4(Sep), 575-602.
- QUOTE: The DAG-RNN approach comprises three basic steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks or, for that matter, any other class of parameterized functions; and (3) application of weight-sharing within appropriate subsets of DAG connections to capture stationarity and control model complexity. The absence of cycles ensures that the neural networks can be unfolded in “space” so that back-propagation can be used for training.
1996
- (Goller & Kuchler, 1996) ⇒ Christoph Goller, and Andreas Kuchler (1996). "Learning task-dependent distributed representations by backpropagation through structure". Neural Networks, 1, 347-352.
- QUOTE: All kinds of recursive symbolic data structures we aim at can be mapped onto labeled directed acyclic graphs (DAGs)[1]. In order to compute the representation of a graph, the representations of all subgraphs have to be computed first. During the training phase of the LRAAM for each node one phase of forward propagation of activations and one phase of backward propagation (each through the three layers of the network) of errors per epoch is needed. Choosing a DAG-representation for structures -- that allows to represent different occurrences of a (sub-)structure in the training set only as one node -- may lead to a considerable (even exponential) reduction of complexity for the standard LRAAM. This argument also holds for our architecture (see Section 2.4.3). Instead of choosing a tree-like representation we therefore prefer a DAG-like representation for our terms as shown in Figure 2.
1. We do not deal with cyclic structures here.
- QUOTE: All kinds of recursive symbolic data structures we aim at can be mapped onto labeled directed acyclic graphs (DAGs)[1]. In order to compute the representation of a graph, the representations of all subgraphs have to be computed first. During the training phase of the LRAAM for each node one phase of forward propagation of activations and one phase of backward propagation (each through the three layers of the network) of errors per epoch is needed. Choosing a DAG-representation for structures -- that allows to represent different occurrences of a (sub-)structure in the training set only as one node -- may lead to a considerable (even exponential) reduction of complexity for the standard LRAAM. This argument also holds for our architecture (see Section 2.4.3). Instead of choosing a tree-like representation we therefore prefer a DAG-like representation for our terms as shown in Figure 2.