2020 WhyGradientClippingAcceleratesT

(Zhang et al., 2020) ⇒ Jingzhao Zhang, Tianxing He, Suvrit Sra, and Ali Jadbabaie. (2020). “Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.” In: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020).

Subject Headings: Gradient Descent Algorithm; Gradient Clipping Algorithm.

Notes

Online Resource(s):
- code: https://github.com/JingzhaoZhang/why-clipping-accelerates

Cited By

Google Scholar: ~ 68 Citations.

Quotes

Author Keywords

Adaptive methods; optimization; deep learning

Abstract

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, gradient clipping and normalized gradient, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.

References

BibTeX

@inproceedings{2020_WhyGradientClippingAcceleratesT,
  author    = {Jingzhao Zhang and
               Tianxing He and
               Suvrit Sra and
               Ali Jadbabaie},
  title     = {Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity},
  booktitle = {Proceeding of the 8th International Conference on Learning Representations (ICLR 2020)},
  publisher = {OpenReview.net},
  year      = {2020},
  url       = {https://openreview.net/forum?id=BJgnXpVYwS},
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2020 WhyGradientClippingAcceleratesT	Jingzhao Zhang Tianxing He Suvrit Sra Ali Jadbabaie			Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity						2020

2020 WhyGradientClippingAcceleratesT

Notes

Cited By

Quotes

Author Keywords

Abstract

References

BibTeX

Navigation menu

Search