Simple Unidirectional Recurrent Neural Network (SRN)
A Simple Unidirectional Recurrent Neural Network (SRN) is a unidirectional recurrent neural network with simple RNN units.
- AKA: Elman Network.
- Example(s):
- ...
- …
- Counter-Example(s):
- a Bidirectional Recurrent Neural Network;
- a GRU Network, with GRU units;
- a LSTM Network, with LSTM units;
- a Stacked Recurrent Neural Network.
- See: RNN Hidden State, Gated Recurrent Hidden State, Gated Recurrent Hidden State.
References
2018
- (Jurasky & Martin, 2018) ⇒ Daniel Jurafsky, and James H. Martin (2018). "Chapter 9 -- Sequence Processing with Recurrent Networks". In: Speech and Language Processing (3rd ed. draft). Draft of September 23, 2018.
- QUOTE: The sequential nature of simple recurrent networks can be illustrated by unrolling the network in time as is shown in Fig. 9.4. In figures such as this, the various layers of units are copied for each time step to illustrate that they will have differing values over time. However the weights themselves are shared across the various timesteps. Finally, the fact that the computation at time [math]\displaystyle{ t }[/math] requires the value of the hidden layer from time [math]\displaystyle{ t-1 }[/math] mandates an incremental inference algorithm that proceeds from the start of the sequence to the end as shown in Fig. 9.5.
Figure 9.4 A simple recurrent neural network shown unrolled in time. Network layers are copied for each timestep, while the weights U, V and W are shared in common across all timesteps.
Figure 9.5 Forward inference in a simple recurrent network.
- QUOTE: The sequential nature of simple recurrent networks can be illustrated by unrolling the network in time as is shown in Fig. 9.4. In figures such as this, the various layers of units are copied for each time step to illustrate that they will have differing values over time. However the weights themselves are shared across the various timesteps. Finally, the fact that the computation at time [math]\displaystyle{ t }[/math] requires the value of the hidden layer from time [math]\displaystyle{ t-1 }[/math] mandates an incremental inference algorithm that proceeds from the start of the sequence to the end as shown in Fig. 9.5.
2017
- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). “Simple Recurrent Network.” In: (Sammut & Webb, 2017).
- QUOTE: The simple recurrent network is a specific version of the Backpropagation neural network that makes it possible to process of sequential input and output (Elman, 1990). It is typically a three-layer network where a copy of the hidden layer activations is saved and used (in addition to the actual input) as input to the hidden layer in the next time step. The previous hidden layer is fully connected to the hidden layer. Because the network has no recurrent connections per se (only a copy of the activation values), the entire network (including the weights from the previous hidden layer to the hidden layer) can be trained with the backpropagation algorithm as usual. It can be trained to read a sequence of inputs into a target output pattern, to generate a sequence of outputs from a given input pattern, or to map an input sequence to an output sequence (as in predicting the next input). ...
2015
- (Hexahedria, 2015) ⇒ Daniel Johnson (2015). "Recurrent Neural Networks". In: Composing Music With Recurrent Neural Networks
- QUOTE: Notice that in the basic feedforward network, there is a single direction in which the information flows: from input to output. But in a recurrent neural network, this direction constraint does not exist. There are a lot of possible networks that can be classified as recurrent, but we will focus on one of the simplest and most practical.
Basically, what we can do is take the output of each hidden layer, and feed it back to itself as an additional input. Each node of the hidden layer receives both the list of inputs from the previous layer and the list of outputs of the current layer in the last time step. (So if the input layer has 5 values, and the hidden layer has 3 nodes, each hidden node receives as input a total of 5+3=8 values.)
We can show this more clearly by unwrapping the network along the time axis:
In this representation, each horizontal line of layers is the network running at a single time step. Each hidden layer receives both input from the previous layer and input from itself one time step in the past.
The power of this is that it enables the network to have a simple version of memory, with very minimal overhead. This opens up the possibility of variable-length input and output: we can feed in inputs one-at-a-time, and let the network combine them using the state passed from each time step.
- QUOTE: Notice that in the basic feedforward network, there is a single direction in which the information flows: from input to output. But in a recurrent neural network, this direction constraint does not exist. There are a lot of possible networks that can be classified as recurrent, but we will focus on one of the simplest and most practical.
1990
- (Elman, 1990) ⇒ Jeffrey L. Elman. (1990). “Finding Structure in Time." Cognitive science 14, no. 2
- QUOTE: Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implicitly by its effects on processing rather than explicitly (as in a spatial representation). The current report develops a proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory. In this approach, hidden unit patterns are fed back to themselves; the internal representations which develop thus reflect task demands in the context of prior internal states. A set of simulations is reported which range from relatively simple problems (temporal version of XOR) to discovering syntactic / semantic features for words. The networks are able to learn interesting internal representations which incorporate task demands with memory demands; indeed, in this approach the notion of memory is inextricably bound up with task processing. These representations reveal a rich structure, which allows them to be highly context-dependent, while also expressing generalizations across classes of items. These representations suggest a method for representing lexical categories and the type / token distinction.