Robustness Regression System
Jump to navigation
Jump to search
A Robustness Regression System is a Regression System that implements a Robust Regression Algorithm to solve a Robust Regression Task.
- AKA: Robust Regression System, Robust Regressor, Robust Regression Estimator.
- Example(s):
- Counter-Example(s):
- See: Cross-Validation Task, Regression Analysis Task, Bayesian Inference, Parametric Model.
References
2017
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/linear_model.html#robustness-regression-outliers-and-modeling-errors Retrieved:2017-09-17
- QUOTE: Robust regression is interested in fitting a regression model in the presence of corrupt data: either outliers, or error in the model
(...)
An important notion of robust fitting is that of breakdown point: the fraction of data that can be outlying for the fit to start missing the inlying data.
Note that in general, robust fitting in high-dimensional setting (large n_features) is very hard. The robust models here will probably not work in these settings.
Trade-offs: which estimator?
Scikit-learn provides 3 robust regression estimators: RANSAC, Theil Sen and HuberRegressor.
- HuberRegressor should be faster than RANSAC and Theil Sen unless the number of samples are very large, i.e n_samples >> n_features. This is because RANSAC and Theil Sen fit on smaller subsets of the data. However, both Theil Sen and RANSAC are unlikely to be as robust as HuberRegressor for the default parameters.
- RANSAC is faster than Theil Sen and scales much better with the number of samples.
- RANSAC will deal better with large outliers in the y direction (most common situation)
- Theil Sen will cope better with medium-size outliers in the X direction, but this property will disappear in large dimensional settings.
- QUOTE: Robust regression is interested in fitting a regression model in the presence of corrupt data: either outliers, or error in the model