Stepwise Regression Algorithm

A Stepwise Regression Algorithm is a regression algorithm that is a predictor variable selection-based learning algorithm (in which predictor variable selection use an automatic procedure).

Context:
- It can range from being a Forward Selection Stepwise Regression to being a Backward Selection Stepwise Regression to being a Bidirectional Stepwise Regression.
- It can range from being a Linear Stepwise Regression Algorithm to being a Non-Linear Stepwise Regression Algorithm.
Example(s):
- a Stepwise Regression Tree Training Algorithm.
- …
Counter-Example(s):
- a Stepwise Classification Tree Training Algorithm.
See: F-Test, t-Test, R-Square, Akaike Information Criterion, Bayesian Information Criterion, PRESS Statistic, False Discovery Rate, Parameter, Approximation, Sample Size, Regression Analysis, Residual Sum of Squares.

References

2020

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Stepwise_regression Retrieved:2020-3-13.
- In statistics, stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. ^[1] ^[2] ^[3] ^[4] In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Usually, this takes the form of a sequence of F-tests or t-tests, but other techniques are possible, such as adjusted R², Akaike information criterion, Bayesian information criterion, Mallows's C_p, PRESS, or false discovery rate. The frequent practice of fitting the final selected model followed by reporting estimates and confidence intervals without adjusting them to take the model building process into account has led to calls to stop using stepwise model building altogether^[5] ^[6] or to at least make sure model uncertainty is correctly reflected.^[7] ^[8] [[Image: MORE2REMOVE , one must keep in mind the number of parameters, P, to estimate and adjust the sample size accordingly. For K variables, P = 1_(Start) + K_(Stage I) + (K² − K)/2_(Stage II) + 3K_(Stage III) = 0.5K² + 3.5K + 1. For K < 17, an efficient design of experiments exists for this type of model, a Box–Behnken design, ^[9] augmented with positive and negative axial points of length min(2, (int(1.5 + K/4))^1/2), plus point(s) at the origin. There are more efficient designs, requiring fewer runs, even for K > 16.]]

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/stepwise_regression#Main_approaches Retrieved:2015-1-27.
- The main approaches are:
  - Forward selection, which involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until none improves the model.
  - Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.
  - Bidirectional elimination, a combination of the above, testing at each step for variables to be included or excluded.
- A widely used algorithm was first proposed by Efroymson (1960). ^[10] This is an automatic procedure for statistical model selection in cases where there is a large number of potential explanatory variables, and no underlying theory on which to base the model selection. The procedure is used primarily in regression analysis, though the basic approach is applicable in many forms of model selection. This is a variation on forward selection. At each stage in the process, after a new variable is added, a test is made to check if some variables can be deleted without appreciably increasing the residual sum of squares (RSS). The procedure terminates when the measure is (locally) maximized, or when the available improvement falls below some critical value.

↑ Efroymson,M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf,H. S., (eds.), Wiley, New York.
↑ Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression," Biometrics, 32.
↑ Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc.
↑ SAS Institute Inc. (1989) SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc.
↑ Flom, P. L. and Cassell, D. L. (2007) "Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use," NESUG 2007.
↑ Harrell, F. E. (2001) "Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis," Springer-Verlag, New York.
↑ Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.
↑ Efron, B. and Tibshirani, R. J. (1998) "An introduction to the bootstrap," Chapman & Hall/CRC
↑ Box–Behnken designs from a handbook on engineering statistics at NIST
↑ Efroymson, MA (1960) "Multiple regression analysis." In Ralston, A. and Wilf, HS, editors, Mathematical Methods for Digital Computers. Wiley.

2003

http://www.nature.com/nrg/journal/v4/n9/glossary/nrg1155_glossary.html
- STEPWISE REGRESSION The step-by-step build-up of a regression model, which represents a dependent variable as a weighted sum (linear combination) of independent (risk) variables.

[1] Efroymson,M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf,H. S., (eds.), Wiley, New York.

[2] Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression," Biometrics, 32.

[3] Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc.

[4] SAS Institute Inc. (1989) SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc.

[Flom2007-5] Flom, P. L. and Cassell, D. L. (2007) "Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use," NESUG 2007.

[6] Harrell, F. E. (2001) "Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis," Springer-Verlag, New York.

[Chatfield1995-7] Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.

[8] Efron, B. and Tibshirani, R. J. (1998) "An introduction to the bootstrap," Chapman & Hall/CRC

[9] Box–Behnken designs from a handbook on engineering statistics at NIST

[10] Efroymson, MA (1960) "Multiple regression analysis." In Ralston, A. and Wilf, HS, editors, Mathematical Methods for Digital Computers. Wiley.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Stepwise Regression Algorithm

References

2020

2015

2003

Navigation menu

Search