Neural Network Forward Pass

A Neural Network Forward Pass is a Forward Propagation Algorithm that feeds input values into the neural network and computes the output predicted values.

AKA: Neural Network Forward State.
Example(s):
- a BRNN Feedforward Algorithm.
- …
Counter-Example(s):
- Neural Network Backward Pass,
- Update Weight Function.
See: Bidirectional Neural Network, ConvNet Network, RNN Network.

References

2018a

(CS231n, 2018) ⇒http://cs231n.github.io/neural-networks-1/#feedforward Retrieved: 2018-07-15.
- QUOTE: The full forward pass of this 3-layer neural network is then simply three matrix multiplications, interwoven with the application of the activation function:

 # forward-pass of a 3-layer neural network:        f = lambda x:  1.0/( 1.0 + np.exp(-x))  # activation function (use sigmoid)
        x = np.random.randn(3, 1)  # random input vector of three numbers (3x1)
        h1 = f(np.dot(W1, x) + b1)  # calculate first hidden layer activations (4x1)
        h2 = f(np.dot(W2, h1) + b2)  # calculate second hidden layer activations (4x1)
        out = np.dot(W3, h2) + b3  # output neuron (1x1)

In the above code, W1,W2,W3,b1,b2,b3 are the learnable parameters of the network.

2018b

(Wikipedia, 2018) ⇒ https://www.wikiwand.com/en/Backpropagation#Pseudocode Retrieved: 2018-07-15.
- The following is pseudocode for a stochastic gradient descent algorithm for training a three-layer network (only one hidden layer):

   initialize network weights (often small random values)        do
        forEach training example named ex

         prediction = neural-net-output(network, ex)  // forward pass

         actual = teacher-output(ex)

         compute error (prediction - actual) at the output units

         compute [math]\displaystyle{ \Delta w_h }[/math] for all weights from hidden layer to output layer  // backward pass

         compute [math]\displaystyle{ \Delta w_i }[/math] for all weights from input layer to hidden layer   // backward pass continued
         update network weights // input layer not modified by error estimate
        until all examples classified correctly or another stopping criterion satisfied
        return the network
        

The lines labeled "backward pass" can be implemented using the backpropagation algorithm, which calculates the gradient of the error of the network regarding the network's modifiable weights.^[1]

2018c

(Raven Protocol, 2018) ⇒ "Forward propagation" In: https://medium.com/ravenprotocol/everything-you-need-to-know-about-neural-networks-6fcc7a15cb4 Retrieved: 2018-07-15
- QUOTE: Forward propagation is a process of feeding input values to the neural network and getting an output which we call predicted value. Sometimes we refer forward propagation as inference. When we feed the input values to the neural network’s first layer, it goes without any operations. Second layer takes values from first layer and applies multiplication, addition and activation operations and passes this value to the next layer. Same process repeats for subsequent layers and finally we get an output value from the last layer.
  Forward Propagation

2016

(wang, 2016) ⇒ Tingwu Wang (2016) Recurrent Neural Network Tutorial: http://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf
- QUOTE: 1. Algorithm looks like this:
  .

2015

(Urtasun & Zemel, 2015]]) ⇒ Raquel Urtasun, and Rich Zemel (2015). CSC 411: Lecture 10: Neural Networks I"
- QUOTE: We only need to know two algorithms:
  - Forward pass: performs inference.
  - Backward pass: performs learning

2005

(Graves & Schmidhuber, 2005) ⇒ Alex Graves and Jurgen Schmidhuber (2005). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures" (PDF). Neural Networks, 18(5-6), 602-610. DOI 10.1016/j.neunet.2005.06.042
- QUOTE: Forward Pass
  - Reset all activations to 0.
  - Running forwards from time [math]\displaystyle{ \tau_0 }[/math]to time [math]\displaystyle{ \tau_1 }[/math], feed in the inputs and update the activations. Store all hidden layer and output activations at every timestep.
  - For each LSTM block, the activations are updated as follows:
    Input Gates: [math]\displaystyle{ x_l =\sum_{j \in N} w_{lj} y_j (\tau − 1) + \sum_{c in C}w_{lc}s_c(\tau − 1) \;,\; \quad y_l = f(x_l) }[/math]
    Forget Gates: [math]\displaystyle{ x_phi =\sum_{j \in N} w_{\phi j}y_j (\tau − 1) + \sum_{c\in C}w_{\phi c}s_c(\tau − 1) \;,\; \quad y_\phi = f(x_\phi) }[/math]
    Cells: [math]\displaystyle{ \forall_{c \in C}, x_c =\sum_{j\in N} w_{cj}y_j (\tau − 1)\;,\; \quad s_c = y_\phi s_c(\tau − 1) + y_lg(x_c) }[/math]
    Output Gates: [math]\displaystyle{ x_w =\sum_{j\in N}w_{wj}y_j (\tau − 1) + \sum_{c\in C} w_{wc}s_c(\tau)\;,\; \quad y_w = f(x_w) }[/math]
    Cell Outputs: [math]\displaystyle{ \forall_{c \in C}, y_c = y_wh(s_c) }[/math]

1997

(Schuster & Paliwal, 1997) ⇒ Mike Schuster, and Kuldip K. Paliwal. (1997). “Bidirectional Recurrent Neural Networks.” In: IEEE Transactions on Signal Processing Journal, 45(11). doi:10.1109/78.650093
- QUOTE: To overcome the limitations of a regular RNN outlined in the previous section, we propose a bidirectional recurrent neural network (BRNN) that can be trained using all available input information in the past and future of a specific time frame.
  1) Structure: The idea is to split the state neurons of a regular RNN in a part that is responsible for the positive time direction (forward states) and a part for the negative time direction (backward states). Outputs from forward states are not connected to inputs of backward states, and vice versa. This leads to the general structure that can be seen in Fig. 3, where it is unfolded over three time steps.
  Fig. 3. General structure of the bidirectional recurrent neural network (BRNN) shown unfolded in time for three time steps
  .
  (...) The training procedure for the unfolded bidirectional network over time can be summarized as follows.

1) FORWARD PASS

Run all input data for one time slice [math]\displaystyle{ 1 \lt t \leq T }[/math]through the BRNN and determine all predicted outputs.

a) Do forward pass just for forward states (from [math]\displaystyle{ t=1 }[/math] to [math]\displaystyle{ t=T }[/math]) and backward states (from [math]\displaystyle{ t=T }[/math] to [math]\displaystyle{ t=1 }[/math]).

b) Do forward pass for output neurons.

2) BACKWARD PASS

Calculate the part of the objective function derivative for the time slice [math]\displaystyle{ 1 \lt t \leq T }[/math] used in the forward pass.

a) Do backward pass for output neurons.

b) Do backward pass just for forward states (from [math]\displaystyle{ t=T }[/math] to [math]\displaystyle{ t=1 }[/math]) and backward states (from [math]\displaystyle{ t=1 }[/math] to [math]\displaystyle{ t=T }[/math]).

3) UPDATE WEIGHTS

↑ Paul J. Werbos (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY: John Wiley & Sons, Inc.

[1] Paul J. Werbos (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY: John Wiley & Sons, Inc.

[1]