Gaussian Process Model

From GM-RKB
(Redirected from kernels)
Jump to navigation Jump to search

A Gaussian process model is a stochastic process model based on a finite linear combination of random variables/samples with a (consistent) joint Gaussian distribution.



References

2017a

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process Retrieved:2017-12-4.
    • In probability theory and statistics, a Gaussian process is a particular kind of statistical model where observations occur in a continuous domain, e.g. time or space. In a Gaussian process, every point in some continuous input space is associated with a normally distributed random variable. Moreover, every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

      Viewed as a machine-learning algorithm, a Gaussian process uses lazy learning and a measure of the similarity between points (the kernel function) to predict the value for an unseen point from training data. The prediction is not just an estimate for that point, but also has uncertainty information — it is a one-dimensional Gaussian distribution (which is the marginal distribution at that point).

      For some kernel functions, matrix algebra can be used to calculate the predictions using the technique of kriging. When a parameterised kernel is used, optimisation software is typically used to fit a Gaussian process model.

      The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the Gaussian distribution (normal distribution). Gaussian processes can be seen as an infinite-dimensional generalization of multivariate normal distributions.

      Gaussian processes are useful in statistical modelling, benefiting from properties inherited from the normal. For example, if a random process is modelled as a Gaussian process, the distributions of various derived quantities can be obtained explicitly. Such quantities include the average value of the process over a range of times and the error in estimating the average using sample values at a small set of times.

2017b

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process#Definition Retrieved:2017-12-4.
    • A time continuous stochastic process is Gaussian if and only if for every finite set of indices [math]\displaystyle{ t_1,\ldots,t_k }[/math] in the index set [math]\displaystyle{ T }[/math] : [math]\displaystyle{ \mathbf{X}_{t_1, \ldots, t_k} = (\mathbf{X}_{t_1}, \ldots, \mathbf{X}_{t_k}) }[/math] is a multivariate Gaussian random variable.[1] That is the same as saying every linear combination of [math]\displaystyle{ (\mathbf{X}_{t_1}, \ldots, \mathbf{X}_{t_k}) }[/math] has a univariate normal (or Gaussian) distribution. Using characteristic functions of random variables, the Gaussian property can be formulated as follows: [math]\displaystyle{ \left\{X_t ; t\in T\right\} }[/math] is Gaussian if and only if, for every finite set of indices [math]\displaystyle{ t_1,\ldots,t_k }[/math], there are real-valued [math]\displaystyle{ \sigma_{\ell j} }[/math] , [math]\displaystyle{ \mu_\ell }[/math] with [math]\displaystyle{ \sigma_{jj} \gt 0 }[/math] such that the following equality holds for all [math]\displaystyle{ s_1,s_2,\ldots,s_k\in\mathbb{R} }[/math] : [math]\displaystyle{ \operatorname{E}\left(\exp\left(i \ \sum_{\ell=1}^k s_\ell \ \mathbf{X}_{t_\ell}\right)\right) = \exp \left(-\frac{1}{2} \, \sum_{\ell, j} \sigma_{\ell j} s_\ell s_j + i \sum_\ell \mu_\ell s_\ell\right). }[/math] where [math]\displaystyle{ i }[/math] denotes the imaginary number [math]\displaystyle{ \sqrt{-1} }[/math] .

      The numbers [math]\displaystyle{ \sigma_{\ell j} }[/math] and [math]\displaystyle{ \mu_\ell }[/math] can be shown to be the covariances and means of the variables in the process.

2017c

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Gaussian_process#Applications Retrieved:2017-12-4.
    • A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference. Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose covariance matrix parameter is the Gram matrix of your N points with some desired kernel, and sample from that Gaussian. Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or kriging; extending Gaussian process regression to multiple target variables is known as cokriging. Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool. Gaussian process regression can be further extended to address learning tasks in both supervised (e.g. probabilistic classification ) and unsupervised (e.g. manifold learning) learning frameworks. Gaussian processes can also be used in the context of mixture of experts models, e.g.,. [2] [3] The underlying rationale of such a learning framework consists in the fundamental assumption that the mapping of independent to dependent variables cannot be sufficiently captured by a single Gaussian process model. On the contrary, it is considered that the observations space is naturally divided into subspaces, each of which is characterized by a significantly different mapping function; each of these is learned via a different Gaussian process component in the postulated mixture.

2016

2009a

2009b

2006

2005

2004


  1. MacKay, David, J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. pp. 540. ISBN 9780521642989. http://www.inference.phy.cam.ac.uk/itprnn/book.pdf. ""The probability distribution of a function [math]\displaystyle{ y(\mathbf{x}) }[/math] is a Gaussian processes if for any finite selection of points [math]\displaystyle{ \mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(N)} }[/math], the density [math]\displaystyle{ P(y(\mathbf{x}^{(1)}),y(\mathbf{x}^{(2)}),\ldots,y(\mathbf{x}^{(N)})) }[/math] is a Gaussian"" 
  2. Emmanouil A. Platanios and Sotirios P. Chatzis, “Gaussian Process-Mixture Conditional Heteroscedasticity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 888–900, May 2014. [1]
  3. Sotirios P. Chatzis, “A Latent Variable Gaussian Process Model with Pitman-Yor Process Priors for Multiclass Classification,” Neurocomputing, vol. 120, pp. 482–489, Nov. 2013. [2]