Coordinate Descent Algorithm

From GM-RKB
Jump to navigation Jump to search

A Coordinate Descent Algorithm is a mathematical optimization algorithm that is a non-derivative algorithm.



References

2018

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/coordinate_descent Retrieved:2018-3-11.
    • Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

2018b

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Coordinate_descent#Description Retrieved:2018-3-11.
    • Coordinate descent is based on the idea that the minimization of a multivariable function [math]\displaystyle{ F(\mathbf{x}) }[/math] can be achieved by minimizing it along one direction at a time, i.e., solving univariate (or at least much simpler) optimization problems in a loop. In the simplest case of cyclic coordinate descent, one cyclically iterates through the directions, one at a time, minimizing the objective function with respect to each coordinate direction at a time. That is, starting with initial variable values : [math]\displaystyle{ \mathbf{x}^0 = (x^0_1, \ldots, x^0_n) }[/math] ,

      round [math]\displaystyle{ k+1 }[/math] defines [math]\displaystyle{ \mathbf{x}^{k+1} }[/math] from [math]\displaystyle{ \mathbf{x}^k }[/math] by iteratively solving the single variable optimization problems : [math]\displaystyle{ x^{k+1}_i = \underset{y\in\mathbb R}{\operatorname{arg\,min}}\; f(x^{k+1}_1, \dots, x^{k+1}_{i-1}, y, x^k_{i+1}, \dots, x^k_n) }[/math] [1]

      for each variable [math]\displaystyle{ x_i }[/math] of [math]\displaystyle{ \mathbf{x} }[/math] , for [math]\displaystyle{ i }[/math] from 1 to [math]\displaystyle{ n }[/math] .

      Thus, one begins with an initial guess [math]\displaystyle{ \mathbf{x}^0 }[/math] for a local minimum of [math]\displaystyle{ F }[/math] , and gets a sequence [math]\displaystyle{ \mathbf{x}^0, \mathbf{x}^1, \mathbf{x}^2, \dots }[/math] iteratively.

      By doing line search in each iteration, one automatically has : [math]\displaystyle{ F(\mathbf{x}^0)\ge F(\mathbf{x}^1)\ge F(\mathbf{x}^2)\ge \dots. }[/math] It can be shown that this sequence has similar convergence properties as steepest descent. No improvement after one cycle of line search along coordinate directions implies a stationary point is reached.

      This process is illustrated below.

2017a