Least-Squares Function Fitting Task

A Least-Squares Function Fitting Task is a function fitting task whose fitness function is a sum-of-squares function for the point residuals.

AKA: Least Squares Minimization.
Context:
- Input: Function Fitting Task Input, with a sum-of-squares function loss function.
- output: a Least Square Solution (that minimizes the fitness function).
- It can be solved by a Least-Squares Function Fitting System (that implements a least squares function fitting algorithm).
- It can range from being a Linear Least-Squares Optimization Task to being a Nonlinear Least-Squares Optimization Task.
- It can range from being a Regularized Least-Squares Function Fitting Task to being an Ordinary Least-Squares Function Fitting Task.
- It can range from being an Exact Least-Squares Function Fitting Task to being an Approximate Least-Squares Function Fitting Task.
Example(s):
- “use a least-squares method to solve regression problem X.”
- Ordinary Least-Squares Estimation, such as ordinary linear least-squares regression.
- Non-Negative Least Squares (NNLS) Task.
- …
Counter-Example(s):
See: Supervised Model-based Regression, Overdetermined System, Regression Analysis, Closed-Form Expression, Maximum Likelihood, Method of Moments, Least Squares Classification, Least Squares Ranking, Conditional Distribution, Bayesian Statistics, Gauss–Markov Theorem.

References

2017

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Regression_analysis#History Retrieved:2017-8-20.
- The earliest form of regression was the method of least squares, which was published by Legendre in 1805,^[1] and by Gauss in 1809.^[2] Legendre and Gauss both applied the method to the problem of determining, from astronomical observations, the orbits of bodies about the Sun (mostly comets, but also later the then newly discovered minor planets). Gauss published a further development of the theory of least squares in 1821,^[3] including a version of the Gauss–Markov theorem.
  The term "regression" was coined by Francis Galton in the nineteenth century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). For Galton, regression had only this biological meaning, ^[4] ^[5] but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context. In the work of Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925. Fisher assumed that the conditional distribution of the response variable is Gaussian, but the joint distribution need not be. In this respect, Fisher's assumption is closer to Gauss's formulation of 1821. In the 1950s and 1960s, economists used electromechanical desk calculators to calculate regressions. Before 1970, it sometimes took up to 24 hours to receive the result from one regression. ^[6]
  Regression methods continue to be an area of active research. In recent decades, new methods have been developed for robust regression, regression involving correlated responses such as time series and growth curves, regression in which the predictor (independent variable) or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression.

↑ A.M. Legendre. Nouvelles méthodes pour la détermination des orbites des comètes, Firmin Didot, Paris, 1805. “Sur la Méthode des moindres quarrés” appears as an appendix.
↑ C.F. Gauss. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum. (1809)
↑ C.F. Gauss. Theoria combinationis observationum erroribus minimis obnoxiae. (1821/1823)
↑ Francis Galton. “Typical laws of heredity", Nature 15 (1877), 492–495, 512–514, 532–533. (Galton uses the term "reversion" in this paper, which discusses the size of peas.)
↑ Francis Galton. Presidential address, Section H, Anthropology. (1885) (Galton uses the term "regression" in this paper, which discusses the height of humans.)
↑ Rodney Ramcharan. Regressions: Why Are Economists Obessessed with Them? March 2006. Accessed 2011-12-03.

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Least_squares_(function_approximation) Retrieved:2015-6-14.
- In mathematics, the idea of least squares can be applied to approximating a given function by a weighted sum of other functions. The best approximation can be defined as that which minimises the difference between the original function and the approximation; for a least-squares approach the quality of the approximation is measured in terms of the squared differences between the two.

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/least_squares Retrieved:2015-6-14.
- The method of least squares is a standard approach in regression analysis to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. “Least squares" means that the overall solution minimizes the sum of the squares of the errors made in the results of every single equation.
  The most important application is in data fitting. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model. When the problem has substantial uncertainties in the independent variable (the x variable), then simple regression and least squares methods have problems; in such cases, the methodology required for fitting errors-in-variables models may be considered instead of that for least squares.
  Least squares problems fall into two categories: linear or ordinary least squares and non-linear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. The non-linear problem is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases.
  Polynomial least squares describes the variance in a prediction of the dependent variable as a function of the independent variable and the deviations from the fitted curve.
  When the observations come from an exponential family and mild conditions are satisfied, least-squares estimates and maximum-likelihood estimates are identical. The method of least squares can also be derived as a method of moments estimator. The following discussion is mostly presented in terms of linear functions but the use of least-squares is valid and practical for more general families of functions. Also, by iteratively applying local quadratic approximation to the likelihood (through the Fisher information), the least-squares method may be used to fit a generalized linear model. For the topic of approximating a function by a sum of others using an objective function based on squared distances, see least squares (function approximation). The least-squares method is usually credited to Carl Friedrich Gauss (1795),^[1] but it was first published by Adrien-Marie Legendre.

↑ Bretscher, Otto (1995). Linear Algebra With Applications (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

2006

(Tibshirani, 1996) ⇒ Robert Tibshirani. (1996). “Regression Shrinkage and Selection via the Lasso.” In: Journal of the Royal Statistical Society, Series B, 58(1).
- QUOTE: Consider the usual regression situation: we have data [math]\displaystyle{ (\mathbf{x}^i, y^i), i=1,2,...,N \ , }[/math] where [math]\displaystyle{ \mathbf{x}^i=(x_{i1},..., x_{ip})^T }[/math] and [math]\displaystyle{ y_i }[/math] are the regressors and response for the ith observation. The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error.

2003

(Myung, 2003) ⇒ In Jae Myung. (2003). “Tutorial on Maximum Likelihood Estimation.” In: Journal of Mathematical Psychology, 47. doi:10.1016/S0022-2496(02)00028-7
- QUOTE: There are two general methods of parameter estimation. They are least-squares estimation (LSE) and maximum likelihood estimation (MLE). The former has been a popular choice of model fitting in …

1805

(Legendre, 1805) ⇒ Adrien-Marie Legendre. (1805). “Nouvelle formula pour réduire en distances vraies les distances apparentes de la Lune au Soleil ou à une étoile."
- https://archive.org/stream/sourcebookinmath00smit#page/576/mode/2up
- QUOTE: … Of all the principles which can be proposed for [making estimates from a sample], I think there is none more general, more exact, and more easy of application, than that of which we have made use… which consists of rendering the sum of the squares of the errors a minimum. …

[Legendre-1] A.M. Legendre. Nouvelles méthodes pour la détermination des orbites des comètes, Firmin Didot, Paris, 1805. “Sur la Méthode des moindres quarrés” appears as an appendix.

[Gauss-2] C.F. Gauss. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum. (1809)

[Gauss2-3] C.F. Gauss. Theoria combinationis observationum erroribus minimis obnoxiae. (1821/1823)

[4] Francis Galton. “Typical laws of heredity", Nature 15 (1877), 492–495, 512–514, 532–533. (Galton uses the term "reversion" in this paper, which discusses the size of peas.)

[5] Francis Galton. Presidential address, Section H, Anthropology. (1885) (Galton uses the term "regression" in this paper, which discusses the height of humans.)

[6] Rodney Ramcharan. Regressions: Why Are Economists Obessessed with Them? March 2006. Accessed 2011-12-03.

[brertscher-7] Bretscher, Otto (1995). Linear Algebra With Applications (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

[1]

[2]

[3]

[4]

[5]

[6]

[1]