Huber Regression System

From GM-RKB
Jump to navigation Jump to search

A Huber Regression System is a Robustness Regression System that implements a Huber Regression Algorithm to solve a Huber Regression Task.



References

2017

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/linear_model.html#huber-regression Retrieved:2017-09-17
    • QUOTE: The HuberRegressor is different to Ridge because it applies a linear loss to samples that are classified as outliers. A sample is classified as an inlier if the absolute error of that sample is lesser than a certain threshold. It differs from TheilSenRegressor and RANSACRegressor because it does not ignore the effect of the outliers but gives a lesser weight to them.

      The loss function that HuberRegressor minimizes is given by

      \underset{w, \sigma}{min\,} {\sum_{i=1}^n\left(\sigma + H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \alpha {||w||_2}^2}</math>

      where

      [math]\displaystyle{ H_m(z) = \begin{cases} z^2, & \text {if } |z| \lt \epsilon, \\ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases} }[/math]

      (...)

      It is advised to set the parameter epsilon to 1.35 to achieve 95% statistical efficiency.

      The HuberRegressor differs from using SGDRegressor with loss set to huber in the following ways.

      HuberRegressor is scaling invariant. Once epsilon is set, scaling X and y down or up by different values would produce the same robustness to outliers as before. as compared to SGDRegressor where epsilon has to be set again when X and y are scaled.

       HuberRegressor should be more efficient to use on data with small number of samples while SGDRegressor needs a number of passes on the training data to produce the same robustness.