2013 ARiskComparisonofOrdinaryLeastS
- (Dhillon et al., 2013) ⇒ Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, and Lyle H. Ungar. (2013). “A Risk Comparison of Ordinary Least Squares Vs Ridge Regression.” In: The Journal of Machine Learning Research, 14(1).
Subject Headings: Ordinary Least Squares Estimate.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222013%22+A+Risk+Comparison+of+Ordinary+Least+Squares+Vs+Ridge+Regression
- http://dl.acm.org/citation.cfm?id=2567709.2567711&preflayout=flat#citedby
Quotes
Abstract
We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a principal component analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method (PCA-OLS) is within a constant factor (namely 4) of the risk of ridge regression (RR).
1. Introduction
Consider the fixed design setting where we have a set of n vectors X = {X_i}, and let X denote the matrix where the ith row of X is Xi. The observed label vector is [math]\displaystyle{ Y \in R^n }[/math].
Suppose that: Y = Xb+e, where e is independent noise in each coordinate, with the variance of ei being s2. The objective is to learn E[Y] = Xb. The expected loss of a vector b estimator is: L(b) = 1 n EY[kY -Xbk2], Let ˆb be an estimator of b (constructed with a sample Y). Denoting � := 1 n XTX,
we have that the risk (i.e., expected excess loss) is: Risk(ˆb) := Eˆb [L(ˆb)-L(b)] = Eˆb kˆb -bk2 �, where kxk� = x?�x and where the expectation is with respect to the randomness in Y.
We show that a simple variant of ordinary (un-regularized) least squares always compares favorably to ridge regression (as measured by the risk). This observation is based on the following bias variance decomposition:
Risk(ˆb) = Ekˆb - ¯b k2 � | {z }
Variance + k¯b -bk2 � | {z }
Prediction Bias , (1) where ¯b = E[ˆb].
1.1 The Risk of Ridge Regression (RR)
Ridge regression or Tikhonov Regularization (Tikhonov, 1963) penalizes the l2 norm of a parameter vector b and “shrinks” it towards zero, penalizing large values more. The estimator is: ˆb l = argmin b {kY -Xbk2+lkbk2}. The closed form estimate is then: ˆb l = (�+lI)-1 � 1 n XTY � .
Note that ˆb 0 = ˆbl=0 = argmin b {kY -Xbk2}, is the ordinary least squares estimator. Without loss of generality, rotate X such that: � = diag(l1,l2, . . . ,lp), where the li’s are ordered in decreasing order.
To see the nature of this shrinkage observe that: [ˆbl] j := lj lj +l [ˆb0] j, where ˆb0 is the ordinary least squares estimator.
…
2. Ordinary Least Squares with PCA (PCA-OLS)
Now let us construct a simple estimator based on [math]\displaystyle{ \lambda }[/math]. Note that our rotated coordinate system where � is equal to diag(l1,l2, . . . ,lp) corresponds the PCA coordinate system.
Consider the following ordinary least squares estimator on the “top” PCA subspace — it uses the least squares estimate on coordinate j if lj = l and 0 otherwise
[ˆbPCA,l] j = � [ˆb0] j if lj = l 0 otherwise .
…
3. Experiments
First, we generated synthetic data with p = 100 and varying values of n= {20, 50, 80, 110}. …
…
4. Conclusion
We showed that the risk inflation of a particular ordinary least squares estimator (on the “top” PCA subspace) is within a factor 4 of the ridge estimator. It turns out the converse is not true — this PCA estimator may be arbitrarily better than the ridge one.
References
- 1. D. P. Foster and E. I. George. The Risk Inflation Criterion for Multiple Regression. The Annals of Statistics, Pages 1947-1975, 1994.
- 2. A. N. Tikhonov. Solution of Incorrectly Formulated Problems and the Regularization Method. Soviet Math Dokl 4, Pages 501-504, 1963.
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2013 ARiskComparisonofOrdinaryLeastS | Lyle H. Ungar Paramveer S. Dhillon Dean P. Foster Sham M. Kakade | A Risk Comparison of Ordinary Least Squares Vs Ridge Regression |