Theil-Sen Regression System
A Theil-Sen Regression System is a Robustness Regression System that implements a Theil-Sen Algorithm to solve a Theil-Sen Regression Task.
- AKA: Theil-Sen Regressor, Theil-Sen Regression Estimator.
- Context:
- It is a Nonparametric Regression System.
- Example(s):
- Counter-Example(s):
- See: Regression Analysis Task, Random Variable, L2-norm.
References
2017
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/linear_model.html#theil-sen-estimator-generalized-median-based-estimator Retrieved:2017-09-17
- QUOTE: The
TheilSenRegressor
estimator uses a generalization of the median in multiple dimensions. It is thus robust to multivariate outliers. Note however that the robustness of the estimator decreases quickly with the dimensionality of the problem. It looses its robustness properties and becomes no better than an ordinary least squares in high dimension.(...)
TheilSenRegressor
is comparable to the Ordinary Least Squares (OLS) in terms of asymptotic efficiency and as an unbiased estimator. In contrast to OLS, Theil-Sen is a non-parametric method which means it makes no assumption about the underlying distribution of the data. Since Theil-Sen is a median-based estimator, it is more robust against corrupted data aka outliers. In univariate setting, Theil-Sen has a breakdown point of about 29.3% in case of a simple linear regression which means that it can tolerate arbitrary corrupted data of up to 29.3%.The implementation of TheilSenRegressor in scikit-learn follows a generalization to a multivariate linear regression model [8] using the spatial median which is a generalization of the median to multiple dimensions [9].
In terms of time and space complexity, Theil-Sen scales according to
[math]\displaystyle{ \binom{n_{samples}}{n_{subsamples}} }[/math]
which makes it infeasible to be applied exhaustively to problems with a large number of samples and features. Therefore, the magnitude of a subpopulation can be chosen to limit the time and space complexity by considering only a random subset of all possible combinations.
- QUOTE: The