1999 LinearNeuralNetworks

Subject Headings: Course, Presentation Slides, Gradient Descent Algorithm.

Notes

To find the gradient G for the entire data set, we sum at each weight the contribution given by equation 6 over all the data points. We can then subtract a small proportion µ (called the learning rate) of G from the weights to perform gradient descent.
1. Initialize all weights to small random values.
2. REPEAT until done
- 1. For each weight wij set
- 2. For each data point (x, t)p
  - 1. set input units to x
  - 2. compute value of output units
  - 3. For each weight wij set
3. For each weight wij set
An alternative approach is online learning, where the weights are updated immediately after seeing each data point. Since the gradient for a single data point can be considered a noisy approximation to the overall gradient G (Fig. 5), this is also called stochastic (noisy) gradient descent.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1999 LinearNeuralNetworks	Genevieve Orr			Linear Neural Networks			http://www.willamette.edu/~gorr/classes/cs449/linear2.html