Backpropagation of Errors (BP)-based Training Algorithm

From GM-RKB
(Redirected from Generalized delta rule)
Jump to navigation Jump to search

A Backpropagation of Errors (BP)-based Training Algorithm is a feed-forward supervised neural network training algorithm that applies a delta rule to feed the intermediate NNets loss function gradient (which is used to update the neuron weights to minimize the loss function).



References

2018

2017a

2017b

[math]\displaystyle{ \begin{align}J(W,b; x,y) = \frac{1}{2} \left\| h_{W,b}(x) - y \right\|^2.\end{align} }[/math]

This is a (one-half) squared-error cost function. Given a training set of [math]\displaystyle{ m }[/math] examples, we then define the overall cost function to be:

[math]\displaystyle{ \begin{align}J(W,b)&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \\ &= \left[ \frac{1}{m} \sum_{i=1}^m \left( \frac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \end{align} }[/math]

The first term in the definition of [math]\displaystyle{ J(W,b) }[/math] is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.

2014

2014b

1986