Regularized Regression Task

Context:
- It can be represented as: [math]\displaystyle{ \hat{\beta}(\lambda)={\rm arg}\ {\rm min}_{\beta}\ L({\rm y},X\beta)+\lambda J(\beta) }[/math].
- It can be solved by Regularized Optimization System (that implements a regularized optimization algorithm).
Example(s):
- L1 Regression.
- L2 Regression.
Counter-Example(s):
- Non-Regularized Optimization, such as ordinary least-squares.
See: L1 Regularized Linear Regression, Overfitting, Ill-Posed Problem.

References

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Regularization_(mathematics) Retrieved:2017-3-2.
- Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.

(Rosset & Zhu, 2007) ⇒ Saharon Rosset, and Ji Zhu. (2007). “Piecewise Linear Regularized Solution Paths." The Annals of Statistics
- QUOTE: We consider the generic regularized optimization problem [math]\displaystyle{ \hat{\beta}(\lambda)={\rm arg}\ {\rm min}_{\beta}\ L({\rm y},X\beta)+\lambda J(\beta) }[/math]. Efron, Hastie, Johnstone and Tibshirani (Ann. Statist. 32. 2004. 407-499) have shown that for the LASSO - that is, if L is squared error loss and [math]\displaystyle{ J(β) = ∥β∥_1 }[/math] is the [math]\displaystyle{ \ell_{1} }[/math] norm of β - the optimal coefficient path is piecewise linear, that is, $\partial \ hat{\beta} (\ lambda) / \ partial \ lambda $ is piecewise constant. We derive a general characterization of the [[properties of (loss L, penalty J) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.