Root Mean Square Propagation Algorithm (RMSprop)

From GM-RKB
Jump to navigation Jump to search

A Root Mean Square Propagation Algorithm (RMSprop) is a Gradient Descent-based Learning Algorithm that combines Adagrad and Adadelta methods.



References

2018a

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp Retrieved:2018-4-29.
    • RMSProp (for Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. [1] So, first the running average is calculated in terms of means square,

      [math]\displaystyle{ v(w,t):=\gamma v(w,t-1)+(1-\gamma)(\nabla Q_i(w))^2 }[/math]

      where, [math]\displaystyle{ \gamma }[/math] is the forgetting factor. And the parameters are updated as,

      [math]\displaystyle{ w:=w-\frac{\eta}{\sqrt{v(w,t)}}\nabla Q_i(w) }[/math]

      RMSProp has shown excellent adaptation of learning rate in different applications. RMSProp can be seen as a generalization of Rprop and is capable to work with mini-batches as well opposed to only full-batches.

2018b

2018c

2018d

2015

  • (Misra, 2015) ⇒ Ishan Misra (2015)."Optimization for Deep Networks" (PDF)
    • QUOTE: RMSProp = Rprop + SGD.
      • Tieleman & Hinton et al., 2012 (Coursera slide 29, Lecture 6)
      • Scale updates similarly across mini-batches,
      • Scale by decaying average of squared gradient,
        • Rather than the sum of squared gradients in AdaGrad.

          [math]\displaystyle{ r_t=(1-\gamma)f'(\theta)^2+\gamma r_{t-1} }[/math]

          [math]\displaystyle{ v_{t+1}=\frac{\alpha}{\sqrt{r_t}f'(\theta_t)} }[/math],

          [math]\displaystyle{ \theta_{t+1}=\theta_t-v_{t+1} }[/math]

2013


  1. Tieleman, Tijmen and Hinton, Geoffrey (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning