Memory-Based Locally Weighted Regression Task
A Memory-Based Locally Weighted Regression Task is a Nonparametric Regression Task that ...
- AKA: LWR, Memory-Based Locally Weighted Regression.
- Example(s):
- ...
- Counter-Example(s):
- See: Initialism, Scattergram, Non-Parametric Regression, k-Nearest Neighbor Algorithm, Classical Statistics, Least Squares Regression, Nonlinear Regression.
References
2017
- (Ting et al., 2017) ⇒ Jo-Anne Ting, Franzisk Meier, Sethu Vijayakumar, Stefan Schaal (2017) "Locally Weighted Regression for Control" in "Encyclopedia of Machine Learning and Data Mining" (2017) pp 759-772
- QUOTE: Memory-Based Locally Weighted Regression (LWR)
The original locally weighted regression algorithm was introduced by Cleveland (1979) and popularized in the machine learning and learning control community by Atkeson (1989). The algorithm – categorized as a “lazy” approach – can be summarized as follows below (for algorithmic pseudo-code, see Schaal et al. 2002):
- All training data is collected in the rows of the matrix [math]\displaystyle{ \mathbf{X} }[/math] and the vector (For simplicity, only functions with a scalar output are addressed. Vector-valued outputs can be learned either by fitting a separate learning system for each output or by modifying the algorithms to fit multiple outputs (similar to multi-output linear regression).) [math]\displaystyle{ \mathbf{t} }[/math].
- For every query point [math]\displaystyle{ \mathbf{x}_q }[/math], the weighting kernel is centered at the query point.
- The weights are computed with Eq.(4), and all data points’ weights are collected in the diagonal weight matrix [math]\displaystyle{ \mathbf{W}_q }[/math]
- The local regression coefficients are computed as [math]\displaystyle{ \boldsymbol{\beta }_{q} = \left (\mathbf{X}^{T}\mathbf{W}_{ q}\mathbf{X}\right )^{-1}\mathbf{X}^{T}\mathbf{W}_{ q}\mathbf{t} \quad\quad }[/math](5)
- A prediction is formed with [math]\displaystyle{ y_{q} = \left [\mathbf{x}_{q}^{T}\;1\right ]\boldsymbol{\beta }_{q} }[/math].
As in all kernel methods, it is important to optimize the kernel parameters in order to get optimal function fitting quality. For LWR, the critical parameter determining the bias-variance trade-off is the distance metric [math]\displaystyle{ \mathbf{D}_q }[/math]. If the kernel is too narrow, it starts fitting noise. If it is too broad, oversmoothing will occur. [math]\displaystyle{ \mathbf{D}_q }[/math] can be optimized with leave-one-out cross validation to obtain a globally optimal value, i.e., the same [math]\displaystyle{ \mathbf{D}_q=\mathbf{D} }[/math] is used throughout the entire input space of the data. Alternatively, [math]\displaystyle{ \mathbf{D}_q }[/math] can be locally optimized as a function of the query point, i.e., obtain a [math]\displaystyle{ \mathbf{D}_q }[/math] (as indicated by the subscript “q”). In the recent machine learning literature (in particular, work related to kernel methods), such input-dependent kernels are referred to as nonstationary kernels.
- QUOTE: Memory-Based Locally Weighted Regression (LWR)