Gaussian Process-based Regression (GPR) Algorithm
A Gaussian Process-based Regression (GPR) Algorithm is a Gaussian Process-based algorithm that is a model-based supervised regression algorithm.
- Context:
- It can be implemented by a Gaussian Process Regression System (that solves a Gaussian Process regression task).
- …
- Counter-Example(s):
- See: Interpolation, Manifold Learning, Prior Probability Distribution, Bayesian Inference, Multivariate Gaussian, Gram Matrix, Stochastic Kernel, Kriging.
References
2016
- (Wikipedia, 2016) ⇒ http://wikipedia.org/wiki/Gaussian_process#Applications Retrieved:2016-4-9.
- … Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or kriging; extending Gaussian process regression to multiple target variables is known as cokriging. Gaussian processes are thus useful as a powerful non-linear multivariate interpolation and out of sample extension[1] tool. Gaussian process regression can be further extended to address learning tasks in both supervised (e.g. probabilistic classification ) and unsupervised (e.g. manifold learning) learning frameworks.
- ↑ Barkan, O., Weill, J., & Averbuch, A. (2016). "Gaussian Process Regression for Out-of-Sample Extension". arXiv preprint arXiv:1603.02194.
2016
- (Wikipedia, 2016) ⇒ http://wikipedia.org/wiki/Gaussian_process#Gaussian_process_prediction Retrieved:2016-4-9.
- When concerned with a general Gaussian process regression problem, it is assumed that for a Gaussian process f observed at coordinates x, the vector of values is just one sample from a multivariate Gaussian distribution of dimension equal to number of observed coordinates |x|. Therefore under the assumption of a zero-mean distribution, , where is the covariance matrix between all possible pairs for a given set of hyperparameters θ.
As such the log marginal likelihood is: : [math]\displaystyle{ \log p(f(x)|\theta,x) = -\frac{1}{2}f(x)^T K(\theta,x,x')^{-1} f(x) -\frac{1}{2} \log \det(K(\theta,x,x')) - \frac{|x|}{2} \log 2\pi }[/math] and maximizing this marginal likelihood towards θ provides the complete specification of the Gaussian process f. One can briefly note at this point that the first term corresponds to a penalty term for a model's failure to fit observed values and the second term to a penalty term that increases proportionally to a model's complexity. Having specified θ making predictions about unobserved values at coordinates x* is then only a matter of drawing samples from the predictive distribution [math]\displaystyle{ p(y^*|x^*,f(x),x) = N(y^*|A,B) }[/math] where the posterior mean estimate A is defined as: : [math]\displaystyle{ A = K(\theta,x^*,x) K(\theta,x,x')^{-1} f(x) }[/math] and the posterior variance estimate B is defined as: : [math]\displaystyle{ B = K(\theta,x^*,x^*) - K(\theta,x^*,x) K(\theta,x,x')^{-1} K(\theta,x^*,x)^T }[/math] where is the covariance between the new coordinate of estimation x* and all other observed coordinates x for a given hyperparameter vector θ, and are defined as before and is the variance at point x* as dictated by θ. It is important to note that practically the posterior mean estimate (the "point estimate") is just a linear combination of the observations ; in a similar manner the variance of is actually independent of the observations . A known bottleneck in Gaussian process prediction is that the computational complexity of prediction is cubic in the number of points |x| and as such can become unfeasible for larger data sets. Works on sparse Gaussian processes, that usually are based on the idea of building a representative set for the given process f, try to circumvent this issue.
- When concerned with a general Gaussian process regression problem, it is assumed that for a Gaussian process f observed at coordinates x, the vector of values is just one sample from a multivariate Gaussian distribution of dimension equal to number of observed coordinates |x|. Therefore under the assumption of a zero-mean distribution, , where is the covariance matrix between all possible pairs for a given set of hyperparameters θ.
2005
- (Quiñonero-Candela & Rasmussen, 2005) ⇒ Joaquin Quiñonero-Candela, and Carl Edward Rasmussen. (2005). “A Unifying View of Sparse Approximate Gaussian Process Regression.” In: The Journal of Machine Learning Research, 6.
- QUOTE: Gaussian process (GP) regression is a Bayesian approach which assumes a GP prior[1] over functions, i.e. assumes a priori that function values behave according to [math]\displaystyle{ p (\mathbf{f}|x_1, x_2,..., x_n) = \mathcal{N} (0, K), }[/math] where [math]\displaystyle{ f = [f_1, f_2,..., f_n]^T }[/math] is a vector of latent function values, [math]\displaystyle{ f_i = f(\mathbf{x}_i) }[/math] and K is a covariance matrix, whose entries are given by the covariance function, [math]\displaystyle{ K_{ij} = k (x_i, x_j) }[/math]. Note that the GP treats the latent function values [math]\displaystyle{ f_i }[/math] as random variables, indexed by the corresponding input. In the following, for simplicity we will always neglect the explicit conditioning on the inputs; the GP model and all expressions are always conditional on the corresponding inputs. The GP model is concerned only with the conditional of the outputs given the inputs; we do not model anything about the inputs themselves.