Conjugate Gradient Optimization Algorithm

From GM-RKB

Jump to navigation Jump to search

A Conjugate Gradient Optimization Algorithm is a batch function optimization algorithm that ...

AKA: CG.
- …
Counter-Example(s):
- L-BFGS Algorithm.
- Generalized Iterative Scaling.
See: Nonlinear Conjugate Gradient Algorithm, Biconjugate Gradient Algorithm.

References

2012

http://en.wikipedia.org/wiki/Conjugate_gradient_method
- QUOTE: In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is an iterative method, so it can be applied to sparse systems that are too large to be handled by direct methods such as the Cholesky decomposition. Such systems often arise when numerically solving partial differential equations.
  The conjugate gradient method can also be used to solve unconstrained optimization problems such as energy minimization. It was developed by Magnus Hestenes and Eduard Stiefel.^[1]
  The biconjugate gradient method provides a generalization to non-symmetric matrices. Various nonlinear conjugate gradient methods seek minima of nonlinear equations.

↑ Straeter, T. A.. "On the Extension of the Davidon-Broyden Class of Rank One, Quasi-Newton Minimization Methods to an Infinite Dimensional Hilbert Space with Applications to Optimal Control Problems". NASA Technical Reports Server. NASA. http://hdl.handle.net/2060/19710026200. Retrieved 10 October 2011.

2006

(Vishwanathan et al., 2006) ⇒ S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. (2006). “Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.” In: Proceedings of the 23rd International Conference on Machine learning (ICML-2006). doi:10.1145/1143844.1143966
- QUOTE: … Current training methods for CRFs (In this paper, “training” specifically means penalized maximum likelihood parameter estimation) include generalized iterative scaling (GIS), conjugate gradient (CG), and limited-memory BFGS. These are all batch-only algorithms that do not work well in an online setting, and require many passes through the training data to converge. … Stochastic gradient methods, on the other hand, are online and scale sub-linearly with the amount of training data, making them very attractive for large data sets;

1994

(Hagan & Menhaj, 1994) ⇒ Martin T. Hagan, and Mohammad B. Menhaj. (1994). “Training Feedforward Networks with the Marquardt Algorithm.” In: IEEE Transactions on Neural Networks Journal, 5(6). doi:10.1109/72.329697
- QUOTE: The Marquardt algorithm for nonlinear least squares is presented and is incorporated into the backpropagation algorithm for training feedforward neural networks. The algorithm is tested on several function approximation problems, and is compared with a conjugate gradient algorithm and a variable learning rate algorithm.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Conjugate_Gradient_Optimization_Algorithm&oldid=829927"

Concept