Neural Network Hidden Unit
A Neural Network Hidden Unit is an artificial neuron of a hidden layer.
- AKA: Hidden Unit, Hidden Neuron.
- Context:
- It can be characterized by a hidden state: [math]\displaystyle{ h=g(W,x_i, \Theta, b) }[/math].
- Example(s):
- a Feedforward Neural Unit such as:
- a ReLU.
- a Sigmoid Neural Unit,
- a Softmax Neural Unit.
- a Recurrent Neural Unit such as:
- a Convolutional Neural Unit such as:
- …
- a Feedforward Neural Unit such as:
- Counter-Example(s):
- See: Visible Neural Network Unit, Artificial Neural Network, Artificial Neuron, Fully-Connected Neural Network Layer, Neuron Activation Function, Neural Network Weight, Neural Network Connection, Neural Network Topology, Multi Hidden Layer NNet.
References
2018
- (Jurasky & Martin, 2018) ⇒ Daniel Jurafsky, and James H. Martin (2018). "Chapter 9 -- Sequence Processing with Recurrent Networks". In: Speech and Language Processing (3rd ed. draft). Draft of September 23, 2018.
- QUOTE: Long short-term memory (LSTM) networks, divide the context management problem into two sub-problems: removing information no longer needed from the context, and adding information likely to be needed for later decision making. The key to the approach is to learn how to manage this context rather than hard-coding a strategy into the architecture. LSTMs accomplish this through the use of specialized neural units that make use of gates that control the flow of information into and out of the units that comprise the network layers. These gates are implemented through the use of additional sets of weights that operate sequentially on the context layer(...)
Figure 9.14 Basic neural units used in feed-forward, simple recurrent networks (SRN), long short-term memory (LSTM) and gate recurrent units
The neural units used in LSTMs and GRUs are obviously much more complex than basic feed-forward networks. Fortunately, this complexity is largely encapsulated within the basic processing units, allowing us to maintain modularity and to easily experiment with different architectures. To see this, consider Fig. 9.14 which illustrates the inputs/outputs and weights associated with each kind of unit.
At the far left, (a) is the basic feed-forward unit [math]\displaystyle{ h = g(W x+b) }[/math]. A single set of weights and a single activation function determine its output, and when arranged in a layer there is no connection between the units in the layer. Next, (b) represents the unit in an SRN. Now there are two inputs and additional set of weights to go with it. However, there is still a single activation function and output. When arranged as a layer, the hidden layer from each unit feeds in as an input to the next.
Fortunately, the increased complexity of the LSTM and GRU units is encapsulated within the units themselves. The only additional external complexity over the basic recurrent unit (b) is the presence of the additional context vector input and output. This modularity is key to the power and widespread applicability of LSTM and GRU units. Specifically, LSTM and GRU units can be substituted into any of the network architectures described in Section 9.3. And, as with SRNs, multi-layered networks making use of gated units can be unrolled into deep feed-forward networks and trained in the usual fashion with backpropagation.
- QUOTE: Long short-term memory (LSTM) networks, divide the context management problem into two sub-problems: removing information no longer needed from the context, and adding information likely to be needed for later decision making. The key to the approach is to learn how to manage this context rather than hard-coding a strategy into the architecture. LSTMs accomplish this through the use of specialized neural units that make use of gates that control the flow of information into and out of the units that comprise the network layers. These gates are implemented through the use of additional sets of weights that operate sequentially on the context layer(...)