Kernel Density Estimation (KDE) Algorithm
A Kernel Density Estimation (KDE) Algorithm is a probability density estimation algorithm based on kernels.
- Example(s):
- a Gaussian KDE.
- …
- See: Nonparametric Model, Median Absolute Deviation.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Kernel_density_estimation Retrieved:2020-12-16.
- In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Kernel_density_estimation#Definition Retrieved:2020-12-16.
- Let (x1, x2, …, xn) be a univariate independent and identically distributed sample drawn from some distribution with an unknown density ƒ at any given point x. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \widehat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K is the kernel — a non-negative function — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means , where ϕ is the standard normal density function. The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map).
- Let (x1, x2, …, xn) be a univariate independent and identically distributed sample drawn from some distribution with an unknown density ƒ at any given point x. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \widehat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K is the kernel — a non-negative function — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
2008
- http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html
- QUOTE: The goal of density estimation is to approximate the probability density function [math]\displaystyle{ f(\bullet) }[/math] of a random variable [math]\displaystyle{ X }[/math]. Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
Table 6.1: Kernel functions.
- Kernel [math]\displaystyle{ K(u) }[/math]
- Uniform [math]\displaystyle{ \frac{1}{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triangle [math]\displaystyle{ (1-\vert u \vert) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Epanechnikov [math]\displaystyle{ \frac{3}{4}(1-u^{2}) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Quartic [math]\displaystyle{ \frac{15}{16}(1-u^{2})^{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triweight [math]\displaystyle{ \frac{35}{32}(1-u^{2})^{3} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Gaussian [math]\displaystyle{ \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
- Cosinus [math]\displaystyle{ \frac{\pi}{4}\cos(\frac{\pi}{2}u) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- QUOTE: The goal of density estimation is to approximate the probability density function [math]\displaystyle{ f(\bullet) }[/math] of a random variable [math]\displaystyle{ X }[/math]. Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
2008
- (Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
- QUOTE:
- Kernel Method (Kernel Density Estimation): A method for the estimation of probability density function s. Suppose [math]\displaystyle{ X }[/math] is a continuous random variable with unknown probability density functionf. A random sample of observations of [math]\displaystyle{ X }[/math] is taken. If the sample values are denoted by [math]\displaystyle{ x_l, x_2, ..., x_n }[/math] the estimate of f is given by
[math]\displaystyle{ f(x) = A \sum_{j=1}^{n}K(x, x_j), }[/math]
where [math]\displaystyle{ K }[/math] is a kernel function and the constant A is chosen so that [math]\displaystyle{ \int_{-\infty}^{\infty} f(x)dx = 1 }[/math]. The observation [math]\displaystyle{ x_j }[/math] may be regarded as being spread out between [math]\displaystyle{ x_j - a }[/math] and [math]\displaystyle{ x_j+ b }[/math] (usually with [math]\displaystyle{ a = b) }[/math]. The result is that the naive estimate of f as a function capable of taking values only at [math]\displaystyle{ x_l, x_2, ..., x_n }[/math] is replaced by a continuous function having peaks where the data are densest. Examples of kernel functions are the Gaussian kernel,
- Kernel Method (Kernel Density Estimation): A method for the estimation of probability density function s. Suppose [math]\displaystyle{ X }[/math] is a continuous random variable with unknown probability density functionf. A random sample of observations of [math]\displaystyle{ X }[/math] is taken. If the sample values are denoted by [math]\displaystyle{ x_l, x_2, ..., x_n }[/math] the estimate of f is given by
- QUOTE:
- [math]\displaystyle{ K(x, t) = \text{exp} \bigg \{ - {(x-t)^2 \over 2h^2} \bigg \} - \infty \lt x \lt \infty }[/math]
- [math]\displaystyle{ K(x, t) = \left\{ \begin{array}{l l} 1 & - {(x-t)^2 \over 5h^2,}- \sqrt{5}h \lt x - t \lt \sqrt{5}h,\\ 0 & \text{otherwise}\\ \end{array} \right. }[/math]
The constant h is the window width or bandwidth. The choice of h is critical: small values may retain the spikes of the naive estimate, and large values may oversmooth so that important aspects of f are lost.
----