One-half Squared-Error Cost Function

An One-half Squared-Error Cost Function is a Squared-Error Cost Function that is an average sum-of-squares error plus a regularization term.

Example(s):
- For a training set of $m$ examples, this cost function is:
  $J(W,b) =\displaystyle \left[ \dfrac{1}{m} \sum_{i=1}^m \left( \dfrac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right] + \dfrac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2$,
- For a single training example $(x,y)$ this cost function is:
  $J(W,b; x,y) = \dfrac{1}{2} \left\| h_{W,b}(x) - y \right\|^2.$
- …
Counter Example(s):
- an Exponential Loss Function,
- a Hinge-Loss Function, as used by SVMs.
- a Huber Loss Function,
- a Kullback-Leibler Loss Function,
- a Logistic (Log) Loss Function,
- a Savage Loss Function,
- a Tangent Loss Function.
See: Square Loss Function, Squared Error Function, Cross-Entropy Measure, Mean Absolute Error, Mean Squared Error, Learning Cost Function.

References

2014

(DL, 2014) ⇒ http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm
- QUOTE: Suppose we have a fixed training set [math]\displaystyle{ \{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \} }[/math] of [math]\displaystyle{ m }[/math] training examples. We can train our neural network using batch gradient descent. In detail, for a single training example [math]\displaystyle{ (x,y) }[/math], we define the cost function with respect to that single example to be:
  [math]\displaystyle{ \begin{align} J(W,b; x,y) = \frac{1}{2} \left\| h_{W,b}(x) - y \right\|^2. \end{align} }[/math]
  This is a (one-half) squared-error cost function. Given a training set of [math]\displaystyle{ m }[/math] examples, we then define the overall cost function to be:
  [math]\displaystyle{ \begin{align} J(W,b) &= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \\ &= \left[ \frac{1}{m} \sum_{i=1}^m \left( \frac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right] + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \end{align} }[/math]
  The first term in the definition of [math]\displaystyle{ J(W,b) }[/math] is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.

One-half Squared-Error Cost Function

References

2014

Navigation menu

Search