2004 FeatureSelectionL1VsL2Regulariz
- (Ng, 2004) ⇒ Andrew Y. Ng. (2004). “Feature Selection, [math]\displaystyle{ L_1 }[/math] vs. [math]\displaystyle{ L_2 }[/math] Regularization, and Rotational Invariance.” In: Proceedings of the twenty-first International Conference on Machine learning. doi:10.1145/1015330.1015435
Subject Headings:
Notes
- Presentation slides: http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf
Cited By
- http://scholar.google.com/scholar?q=%22Feature+selection%2C+L+1+vs.+L+2+regularization%2C+and+rotational+invariance%22+2004
- http://dl.acm.org/citation.cfm?id=1015330.1015435&preflayout=flat#citedby
Quotes
Abstract
We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L<inf>1</inf> regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L<inf>1</inf> regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant algorithm--- including logistic regression with L<inf>2</inf> regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2004 FeatureSelectionL1VsL2Regulariz | Andrew Y. Ng | Feature Selection, L 1 Vs. L 2 Regularization, and Rotational Invariance | http://www-robotics.stanford.edu/~ang/papers/icml04-l1l2.pdf | 10.1145/1015330.1015435 |