Stochastic Variance Reduced Gradient (SVRG) Algorithm

A Stochastic Variance Reduced Gradient (SVRG) Algorithm is a SGD-based Algorithm designed to reduce the SGD's Variance.

Context:
- It was first developed by Johnson & Zhang(2013).
Example(s):
Counter-Example(s):
See: Stochastic Optimization, Convex Optimization, Learning Rate, Gradient Descent, Adaptive Gradient (AdaGrad) Algorithm.

References

2018a

(Jothimurugesan et al., 2018) ⇒ Ellango Jothimurugesan, Ashraf Tahmasbi, Phillip B Gibbons, and Srikanta Tirthapura. (2018). “Variance-Reduced Stochastic Gradient Descent on Streaming Data.” In: Proceedings of 32nd Conference on Neural Information Processing Systems (NIPS 2018).
- QUOTE: SAG [RSB12] was among the first variance reduction methods proposed and achieves linear convergence rate for smooth and strongly convex problems. SAG requires storage of the last gradient computed for each data point and uses their average in each update. SAGA [DBLJ14] improves on SAG by eliminating a bias in the update. Stochastic Variance-Reduced Gradient (SVRG) [JZ13] is another variance reduction method that does not store the computed gradients, but periodically computes a full-data gradient, requiring more computation than SAGA. Semi-Stochastic Gradient Descent (S2GD) [KR13] differs from SVRG by computing a random stochastic gradient at each iteration and a full gradient occasionally. The gap between two full gradient computations is determined by a geometric law. CHEAPSVRG [SAKS16] is another variant of SVRG. In contrast with SVRG, it estimates the gradient through computing the gradient on a subset of training data points rather than all the data points. However, all of the above variance-reduced methods require [math]\displaystyle{ O(n log n) }[/math] iterations to guarantee convergence to statistical accuracy (to yield a good fit to the underlying data) for [math]\displaystyle{ n }[/math] data points. DYNASAGA [DLH16] achieves statistical accuracy in only [math]\displaystyle{ O(n) }[/math] iterations by using a gradually increasing sample set and running SAGA on it.
  So far, all the algorithms we have discussed are offline, and assume the entire dataset is available beforehand. Streaming SVRG (SSVRG) [FGKS15] is an algorithm that handles streaming data arrivals, and processes them in a single pass through data, using limited memory. In our experimental study, we found STRSAGA to be significantly more accurate than SSVRG. Further, our analysis of STRSAGA shows that it handles arrival distributions which allow for burstiness in the stream, while SSVRG is not suited for this case. In many practical situations, restricting a streaming data algorithm to use limited memory is overly restrictive and as our results show, leads to worse accuracy.

2018b

(Shang et al., 2018) ⇒ Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin Kai Wing Ng, and Yuichi Yoshida. (2018). “Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization.” In: Proceedings of Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR (2018).

2018c

(Zou et al., 2018) ⇒ Difan Zou, Pan Xu, and Quanquan Gu. (2018). “Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics.” In: Proceedings of International Conference on Uncertainty in Artificial Intelligence.

2013

(Johnson & Zhang, 2013) ⇒ Rie Johnson, and Tong Zhang. (2013). “Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction.” In: Proceedings of Advances in Neural Information Processing Systems 26 (NIPS 2013).
- QUOTE: (...) We call this method stochastic variance reduced gradient (SVRG) because it explicitly reduces the variance of SGD. Unlike SGD, the learning rate [math]\displaystyle{ m }[/math] for SVRG does not have to decay, which leads to faster convergence as one can use a relatively large learning rate.

Stochastic Variance Reduced Gradient (SVRG) Algorithm

References

2018a

2018b

2018c

2017a

2017b

2015

2013

Navigation menu

Search