1996 RegressionShrinkageAndSelViaLasso
- (Tibshirani, 1996) ⇒ Robert Tibshirani. (1996). “Regression Shrinkage and Selection via the Lasso.” In: Journal of the Royal Statistical Society, Series B, 58(1).
Subject Headings: Lasso Algorithm.
Notes
- JSTOR Webpage http://www.jstor.org/pss/2346178
- Relevant presentation: http://www.stat.osu.edu/~yklee/882/yongganglasso.pdf
Cited By
- ~4158 http://scholar.google.com/scholar?q=%22Regression+Shrinkage+and+Selection+via+the+Lasso%22+1996
2004
- (Efron et al., 2004) ⇒ Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. (2004). “Least Angle Regression.” In: Annals of Statistics, 32(2). doi:10.1214/009053604000000067
Quotes
Author Keywords
Abstract
We propose a new method for estimation in linear models. The “lasso” minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.
1. Introduction
Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the ith observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error. There are two reasons why the data analyst is often not satisfied with the OLS estimates. The first is prediction accuracy: the OLS estimates often have low bias but large variance; prediction accuracy can sometimes be improved by shrinking or setting to 0 some coefficients. By doing so we sacrifice a little bias to reduce the variance of the predicted values and hence may improve the overall prediction accuracy. The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effects.
The two standard techniques for improving the OLS estimates, subset selection and ridge regression, both have drawbacks. Subset selection provides interpretable models but can be extremely variable because it is a discrete process- regressors are either retained or dropped from the model. Small changes in the data can result in very different models being selected and this can reduce its prediction accuracy. Ridge regression is a continuous process that shrinks coefficients and hence is more stable: however, it does not set any coefficients to 0 and hence does not give an easily interpretable model.
We propose a new technique, called the lasso, for 'least absolute shrinkage and selection operator'. It shrinks some coefficients and sets others to 0, and hence tries to retain the good features of both subset selection and ridge regression.
…
…
,