Coordinate Descent Algorithm
A Coordinate Descent Algorithm is a mathematical optimization algorithm that is a non-derivative algorithm.
- …
- Example(s):
- Counter-Example(s):
- Mini-Batch Algorithm.
- Newton's method.
- Stochastic gradient descent – uses one example at a time, rather than one coordinate
- See: Line Search, Coordinate System, Principal Component Analysis, Adaptive Coordinate Descent.
References
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/coordinate_descent Retrieved:2018-3-11.
- Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.
2018b
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Coordinate_descent#Description Retrieved:2018-3-11.
- Coordinate descent is based on the idea that the minimization of a multivariable function [math]\displaystyle{ F(\mathbf{x}) }[/math] can be achieved by minimizing it along one direction at a time, i.e., solving univariate (or at least much simpler) optimization problems in a loop. In the simplest case of cyclic coordinate descent, one cyclically iterates through the directions, one at a time, minimizing the objective function with respect to each coordinate direction at a time. That is, starting with initial variable values : [math]\displaystyle{ \mathbf{x}^0 = (x^0_1, \ldots, x^0_n) }[/math] ,
round [math]\displaystyle{ k+1 }[/math] defines [math]\displaystyle{ \mathbf{x}^{k+1} }[/math] from [math]\displaystyle{ \mathbf{x}^k }[/math] by iteratively solving the single variable optimization problems : [math]\displaystyle{ x^{k+1}_i = \underset{y\in\mathbb R}{\operatorname{arg\,min}}\; f(x^{k+1}_1, \dots, x^{k+1}_{i-1}, y, x^k_{i+1}, \dots, x^k_n) }[/math] [1]
for each variable [math]\displaystyle{ x_i }[/math] of [math]\displaystyle{ \mathbf{x} }[/math] , for [math]\displaystyle{ i }[/math] from 1 to [math]\displaystyle{ n }[/math] .
Thus, one begins with an initial guess [math]\displaystyle{ \mathbf{x}^0 }[/math] for a local minimum of [math]\displaystyle{ F }[/math] , and gets a sequence [math]\displaystyle{ \mathbf{x}^0, \mathbf{x}^1, \mathbf{x}^2, \dots }[/math] iteratively.
By doing line search in each iteration, one automatically has : [math]\displaystyle{ F(\mathbf{x}^0)\ge F(\mathbf{x}^1)\ge F(\mathbf{x}^2)\ge \dots. }[/math] It can be shown that this sequence has similar convergence properties as steepest descent. No improvement after one cycle of line search along coordinate directions implies a stationary point is reached.
This process is illustrated below.
- Coordinate descent is based on the idea that the minimization of a multivariable function [math]\displaystyle{ F(\mathbf{x}) }[/math] can be achieved by minimizing it along one direction at a time, i.e., solving univariate (or at least much simpler) optimization problems in a loop. In the simplest case of cyclic coordinate descent, one cyclically iterates through the directions, one at a time, minimizing the objective function with respect to each coordinate direction at a time. That is, starting with initial variable values : [math]\displaystyle{ \mathbf{x}^0 = (x^0_1, \ldots, x^0_n) }[/math] ,
2017a
- (scikit-kear, 2017) ⇒ http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model Retrieved:2017-09-24
- The sklearn.linear_model module implements generalized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent. It also implements Stochastic Gradient Descent related algorithms (...)