Learning Rate Annealing Schedule Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
(Created page with "A Learning Rate Annealing Schedule is a Learning Rate Schedule that is based on simulated annealing. * <B>Example(s):</B> ** ... * <B>Counter-Example(s):</B> ** ...")
 
(ContinuousReplacement)
Tag: continuous replacement
Line 9: Line 9:
----
----
----
----
== References ==
== References ==
=== 2012 ===
=== 2012 ===
* ([[2012_ADADELTAAnAdaptiveLearningRateM|Zeiler, 2012]]) &rArr; [[Matthew D. Zeiler]]. (2012). &ldquo;[https://arxiv.org/pdf/1212.5701.pdf ADADELTA: An Adaptive Learning Rate Method].&rdquo; In: e-print [https://arxiv.org/abs/1212.5701 arXiv:1212.5701].
* ([[2012_ADADELTAAnAdaptiveLearningRateM|Zeiler, 2012]]) [[Matthew D. Zeiler]]. (2012). &ldquo;[https://arxiv.org/pdf/1212.5701.pdf ADADELTA: An Adaptive Learning Rate Method].&rdquo; In: e-print [https://arxiv.org/abs/1212.5701 arXiv:1212.5701].
** QUOTE: There have been several attempts to use [[heuristic]]s for [[estimating]] a good [[learning rate]] at each [[iteration]] of [[gradient descent]]. </s> These either attempt to [[speed up]] [[learning]] when suitable or to slow down [[learning]] near a [[local minima]]. </s> Here we consider the latter. </s> <P> When [[gradient descent]] nears a [[minima]] in the [[cost surface]], the [[parameter value]]s can oscillate back and forth around the [[minima]]. </s> One [[method]] to prevent this is to slow down the [[parameter update]]s by decreasing the [[learning rate]]. </s> This can be done manually when the [[validation]] [[accuracy]] appears to [[plateau]]. </s> Alternatively, [[learning rate schedule]]s have been proposed [[#1951_Robinds|Robinds & Monro (1951)]] to [[automatically anneal]] the [[learning rate]] based on how many [[epoch]]s through the [[data]] have been done. </s> These [[approach]]es typically add additional [[hyperparameter]]s to control how quickly the [[learning rate]] [[decay]]s. </s>
** QUOTE: There have been several attempts to use [[heuristic]]s for [[estimating]] a good [[learning rate]] at each [[iteration]] of [[gradient descent]]. </s> These either attempt to [[speed up]] [[learning]] when suitable or to slow down [[learning]] near a [[local minima]]. </s> Here we consider the latter. </s> <P> When [[gradient descent]] nears a [[minima]] in the [[cost surface]], the [[parameter value]]s can oscillate back and forth around the [[minima]]. </s> One [[method]] to prevent this is to slow down the [[parameter update]]s by decreasing the [[learning rate]]. </s> This can be done manually when the [[validation]] [[accuracy]] appears to [[plateau]]. </s> Alternatively, [[learning rate schedule]]s have been proposed [[#1951_Robinds|Robinds & Monro (1951)]] to [[automatically anneal]] the [[learning rate]] based on how many [[epoch]]s through the [[data]] have been done. </s> These [[approach]]es typically add additional [[hyperparameter]]s to control how quickly the [[learning rate]] [[decay]]s. </s>
=== 1951 ===
=== 1951 ===
* <span id="1951_Robinds">(Robinds & Monro, 1951)</span> &rArr; H. Robinds and S. Monro (1951). “A stochastic approximation method”. In: Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.
* <span id="1951_Robinds">(Robinds & Monro, 1951)</span> H. Robinds and S. Monro (1951). “A stochastic approximation method”. In: Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.


----
----

Revision as of 19:35, 4 July 2021

A Learning Rate Annealing Schedule is a Learning Rate Schedule that is based on simulated annealing.



References

2012

1951

  • (Robinds & Monro, 1951) ⇒ H. Robinds and S. Monro (1951). “A stochastic approximation method”. In: Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.