Linear Least-Squares L1-Regularized Regression System
A Linear Least-Squares L1-Regularized Regression System is an regularized linear regression system that is a least-squares regression system which implements a LASSO algorithm to solve a LASSO regression task.
- AKA: LASSO System.
- Example(s)
- Counter-Example(s)
- See: Support Vector Machine, Regularization (Mathematics), Statistics, Machine Learning, Levenberg–Marquardt Algorithm, Non-Linear Least Squares, Ordinary Least Squares, Well-Posed Problem, Overdetermined System, Over-Fitted.
References
2017b
- (Scikit-Learn, 2017) ⇒ "1.1.3. Lasso" http://scikit-learn.org/stable/modules/linear_model.html#lasso
- QUOTE: The
Lasso
is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights (see Compressive sensing: tomography reconstruction with L1 prior (Lasso)).Mathematically, it consists of a linear model trained with [math]\displaystyle{ \ell_1 }[/math] prior as regularizer. The objective function to minimize is:
[math]\displaystyle{ \underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha ||w||_1} }[/math]
The lasso estimate thus solves the minimization of the least-squares penalty with [math]\displaystyle{ \alpha ||w||_1 }[/math] added, where [math]\displaystyle{ \alpha }[/math] is a constant and [math]\displaystyle{ ||w||_1 }[/math] is the [math]\displaystyle{ \ell_1 }[/math]-norm of the parameter ve
- QUOTE: The
2016
- (Jain, 2016) Aarshay Jain (2016) ⇒ https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/#four
- QUOTE:
LASSO stands for Least Absolute Shrinkage and Selection Operator. I know it doesn’t give much of an idea but there are 2 key words here – ‘absolute‘ and ‘selection‘.
Lasso regression performs L1 regularization, i.e. it adds a factor of sum of absolute value of coefficients in the optimization objective. Thus, lasso regression optimizes the following:
Objective = RSS + α * (sum of absolute value of coefficients)
Here, α (alpha) works similar to that of ridge and provides a trade-off between balancing RSS and magnitude of coefficients. Like that of ridge, α can take various values. Lets iterate it here briefly:
- α = 0: Same coefficients as simple linear regression
- α = ∞: All coefficients zero (same logic as before)
- 0 < α < ∞: coefficients between 0 and that of simple linear regression
Yes its appearing to be very similar to Ridge till now. But just hang on with me and you’ll know the difference by the time we finish. Like before, lets run lasso regression on the same problem as above. First we’ll define a generic function: (...).
- QUOTE: