Locally Weighted Regression Task
A Locally Weighted Regression Task is a Nonparametric Regression Task that uses a weighting function and a distance function.
- AKA: Local Regression, Locally Weighted Scatterplot Smoothing Task, LOESS Task, LOWESS Task, Locally Weighted Learning Task.
- Context:
- Task Input:
a N-observed Numerically-Labeled Training Dataset [math]\displaystyle{ D=\{(x_1,y_1,z_1,...),(x_2,y_2,z_2,...),\cdots(x_n,y_n,z_n,...)\} }[/math] that can be represented by
- [math]\displaystyle{ \mathbf{Y} }[/math] response variable(s) continuous dataset.
- [math]\displaystyle{ \mathbf{X} }[/math] predictor variable(s) continuous dataset.
- [math]\displaystyle{ K(x_i,x) }[/math], Kernel Function.
- Task Output:
- [math]\displaystyle{ \hat{w}(x)^\top x }[/math], regression prediction function
- Task Requirements
- It requires to fit weighted regression [math]\displaystyle{ \hat{w}(x) = \arg\min_w \sum_{i=1}^{n} K(x, x_i) (w^\top x_i - y_{i})^2 }[/math]
- It may require a regression diagnostic test to determine goodness of fit of the regression model.
- It can be solved by Locally Weighted Regression System that implements a Locally Weighted Algorithm.
- Task Input:
- Example(s):
- Counter-Example(s)
- See: Initialism, Scattergram, Non-Parametric Regression, k-Nearest Neighbor Algorithm, Classical Statistics, Least Squares Regression, Nonlinear Regression.
References
2017a
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Local_regression Retrieved:2017-9-3.
- LOESS and LOWESS (locally weighted scatterplot smoothing) are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. “LOESS" is a later generalization of LOWESS; although it is not a true initialism, it may be understood as standing for "LOcal regrESSion". [1]
LOESS and LOWESS thus build on "classical" methods, such as linear and nonlinear least squares regression. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.
The trade-off for these features is increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed. Most other modern methods for process modeling are similar to LOESS in this respect. These methods have been consciously designed to use our current computational ability to the fullest possible advantage to achieve goals not easily achieved by traditional approaches.
A smooth curve through a set of data points obtained with this statistical technique is called a Loess Curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the y-axis scattergram criterion variable. When each smoothed value is given by a weighted linear least squares regression over the span, this is known as a Lowess curve ; however, some authorities treat Lowess and Loess as synonyms.
- LOESS and LOWESS (locally weighted scatterplot smoothing) are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. “LOESS" is a later generalization of LOWESS; although it is not a true initialism, it may be understood as standing for "LOcal regrESSion". [1]
2017b
- (Ting et al., 2017) ⇒ Jo-Anne Ting, Franzisk Meier, Sethu Vijayakumar, Stefan Schaal (2017) "Locally Weighted Regression for Control" in "Encyclopedia of Machine Learning and Data Mining" (2017) pp 759-772
- QUOTE: Locally weighted regression refers to supervised learning of continuous functions (otherwise known as function approximation or regression) by means of spatially localized algorithms, which are often discussed in the context of kernel regression, nearest neighbor methods, or lazy learning (Atkeson et al. 1997[1]). Most regression algorithms are global learning systems. For instance, many algorithms can be understood in terms of minimizing a global loss function such as the expected sum squared error:
(...)
Returning to Eqs. (1) to (3), the main differences between global methods that directly solve Eq. (1) and local methods that solve either Eqs. (2) or (3) are listed below:
- (i) A weight [math]\displaystyle{ w_{i,k} }[/math] is introduced that focuses:
- either the function approximation fit in Eq. (2)
- or a local models contribution toward the global function fit in Eq.(3)
- (ii) The learning problem is split into [math]\displaystyle{ K }[/math] independent optimization problems.
- (iii) Due to the restricted scope of the function approximation problem, we do not need a nonlinear basis function expansion and can, instead, work with simple local functions or local polynomials (Hastie and Loader, 1993).
The weights [math]\displaystyle{ w_{k,i} }[/math] in Eq. (2) are typically computed from some kernel function (Atkeson et al., 1997) such as a squared exponential kernel:
[math]\displaystyle{ w_{k,i} =\exp \left (-\frac{1} {2}\left (\mathbf{x}_{i} -\mathbf{c}_{k}\right )^{T}\mathbf{D}_{ k}\left (\mathbf{x}_{i} -\mathbf{c}_{k}\right )\right ) \quad\quad }[/math](4)
with [math]\displaystyle{ \mathbf{D}_k }[/math] denoting a positive semidefinite distance metric and [math]\displaystyle{ \mathbf{c}_k }[/math] the center of the kernel. The number of kernels [math]\displaystyle{ K }[/math] is not finite. In many local learning algorithms, the kernels are never maintained in memory. Instead, for every query point [math]\displaystyle{ \mathbf{x}_q }[/math], a new kernel is centered at [math]\displaystyle{ \mathbf{c}_k= \mathbf{x}_q }[/math], and the localized function approximation is solved with weighted regression techniques (Atkeson et al., 1997).
- (i) A weight [math]\displaystyle{ w_{i,k} }[/math] is introduced that focuses:
1997
- (Atkeson et al., 1997) ⇒ Atkeson, C. G., Moorey, A. W., & Schmalz, S. “Locally Weighted Learning". DOI:10.1023/A:1006559212014
- ABSTRACT: This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
1988
- (Cleveland et al., 1988) ⇒ Cleveland, W. S., Devlin, S. J., & Grosse, E. (1988). Regression by local fitting: methods, properties, and computational algorithms. Journal of econometrics, 37(1), 87-114.DOI: 10.1016/0304-4076(88)90077-2
- ABSTRACT: Local regression is a procedure for estimating regression surfaces by the local fitting of linear or quadratic functions of the independent variables in a moving fashion that is analogous to how a moving average is computed for a time series. The advantage of the methodology over the global fitting of parametric functions of the independent variables by least squares, the current paradigm in regression studies, is that a much wider class of regression functions can be estimated without distortion. In this paper, we discuss the methods, their statistical properties, and computational algorithms.
- ↑ Atkeson C, Moore A, Schaal S (1997) Locally weighted learning. AI Rev 11:11–73
- QUOTE: Locally weighted regression refers to supervised learning of continuous functions (otherwise known as function approximation or regression) by means of spatially localized algorithms, which are often discussed in the context of kernel regression, nearest neighbor methods, or lazy learning (Atkeson et al. 1997[1]). Most regression algorithms are global learning systems. For instance, many algorithms can be understood in terms of minimizing a global loss function such as the expected sum squared error: