Stacked Bidirectional and Unidirectional LSTM (SBU-LSTM) Neural Network

A Stacked Bidirectional and Unidirectional LSTM (SBU-LSTM) Neural Network is a Deep Neural Network that combines both LSTM and BLSTM.

AKA: SBU-LSTM.
Context:
- It can be trained by a SBU-LSTM Training System (that implements a SBU-LSTM Training Algorithm).
- It can usually be used for network-wide traffic speed prediction.
Example(s)
- Deep (unidirectonal or bidirectional) LSTM-CNN with 2 stacked layers ((Kim, 2018)):
  .
- Deep BLSTM-RNN,
- Deep BLSTM-CNN,
Counter-Example(s):
- Stacked RNN,
- Deep RNN,
- BiLSTM-CNN,
- BiLSTM-WSD,
- BiLSTM-CNN-CRF.
See: Bidirectional LSTM, Deep Neural Network Training Task, Bidirectional RNN.

References

2018a

(Cui, Ke & Wang, 2018) ⇒ Zhiyong Cui, Ruimin Ke, and Yinhai Wang (2018). "Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction" (PDF). arXiv preprint [arXiv:1801.02143].
- QUOTE: In this study, we propose a novel deep architecture named stacked bidirectional and unidirectional LSTM network (SBULSTM) to predict the network-wide traffic speed values. Fig. 5 illustrates the graphical architecture of the proposed model. If the input contains missing values, a masking layer should be adopted by the SBU-LSTM. Each SBU-LSTM contains a BDLSTM layer as the first feature-learning layer and a LSTM layer as the last layer. For sake of making full use of the input data and learning complex and comprehensive features, the SBU-LSTMs can include one or more optional middle LSTM/BDLSTM layers. Fig. 5 shows that the SBU-LSTM takes the spatial time series data as the input and predict future speed values for one time-step. The SBU-LSTM is also capable of predicting values for multiple future time steps based on historical data.
  
  Fig. 5: SBU-LSTMs architecture necessarily consists of a BDLSTM layer and a LSTM layer. Masking layer for handling missing values and multiple LSTM or BDLSTM layers as middle layers are optional.

2018b

(Kim, 2018) ⇒ Kyungna Kim (2018). "Arrhythmia Classification in Multi-Channel ECG Signals Using Deep Neural Networks".
- QUOTE: For our LSTM networks, an individual sample is treated as a 600-timestep sequence, where each timestep is a 6-feature data point. In addition to a single-layer unidirectional baseline model, we also train 2- and 5-layer stacks of LSTM units, where the output of each LSTM unit not in the final layer is treated as input to a unit in the next. To increase the contextual information available to the model, we also train a bidirectional variant of the 2-layer LSTM model. These use the same update equations, but add a second LSTM flowing backwards in time; cell outputs from both networks are then concatenated at each time step. As this allows propagation of information from future timesteps, the model can often learn more contextual information and has been shown to improve performance in many tasks (Cui, Ke & Wang, 2018). Full high-level architecture of the 2-layer bidirectional LSTM network is shown in figure 3.2.
  (...) To utilize both the pattern recognition afforded by deep CNNs and the temporal learning ability of LSTMs, we also train an additional architecture that combines them into a single model. We begin with a stacked LSTM to extract temporal structures from the data, and instead of feeding the unrolled hidden state into another LSTM layer, we feed it as input into a (deep) CNN to extract localized features. In the combined model, we begin by feeding the data into a 2-layer LSTM. The output of the final LSTM layer is treated as a one-dimensional image of size (100 × 600), and fed into a CNN to extract localized features. We also train a similar architecture with a bidirectional 2-layer LSTM, where the image is of size (200×600). Full high-level architecture of our combined network is shown in figure 3.4.


Figure 3.2: Bidirectional LSTM network with 2 stacked layers	Figure 3.4: Combined LSTM-CNN model (LSTM portion may be uni- or bidirectional)

Figure 3.3: Residual network, showing only the CNN architecture