Epanechnikov Kernel
An Epanechnikov Kernel is a kernel function that is of quadratic form.
- AKA: Parabolic Kernel Function.
- Context:
- It can be expressed as [math]\displaystyle{ K(u) = \frac{3}{4}(1-u^2) }[/math] for [math]\displaystyle{ |u|\leq 1 }[/math].
- It can be used in a Multivariate Density Estimation.
- It can be optimal with respect to Mean Square Error.
- …
- Example(s):
- [math]\displaystyle{ K(x) = \frac{3}{4}(1-x^2) }[/math] for [math]\displaystyle{ -1\leq x \leq 1 }[/math].
- …
- Counter-Example(s):
- See: Epanechnikov Distribution, Density Estimation, Kernel Density Estimation, Multivariate Kernel Density Estimation, Kernel Smoother, Kernel Regression.
References
2017a
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use Retrieved:2017-7-16.
- Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, [1] quartic (biweight), tricube, triweight, Gaussian, quadratic and cosine.
In the table below, if [math]\displaystyle{ K }[/math] is given with a bounded support, then [math]\displaystyle{ K(u) = 0 }[/math] for values of u lying outside the support.
- Several types of kernel functions are commonly used: uniform, triangle, Epanechnikov, [1] quartic (biweight), tricube, triweight, Gaussian, quadratic and cosine.
Epanechnikov (parabolic) [math]\displaystyle{ K(u) = \frac{3}{4}(1-u^2) }[/math] Support: [math]\displaystyle{ |u|\leq 1 }[/math]
...
2017b
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Kernel_density_estimation#Definition Retrieved:2017-7-16.
- Let (x1, x2, …, xn) be an independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K(•) is the kernel — a non-negative function that integrates to one and has mean zero — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously,[2] and due to its convenient mathematical properties, the normal kernel is often used, which means , where ϕ is the standard normal density function. The construction of a kernel density estimate finds interpretations in fields outside of density estimation.[3] For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning. Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. To see this, we compare the construction of histogram and kernel density estimators, using these 6 data points: x1 = −2.1, x2 = −1.3, x3 = −0.4, x4 = 1.9, x5 = 5.1, x6 = 6.2. For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data. In this case, we have 6 bins each of width 2. Whenever a data point falls inside this interval, we place a box of height 1/12. If more than one data point falls inside the same bin, we stack the boxes on top of each other. For the kernel density estimate, we place a normal kernel with variance 2.25 (indicated by the red dashed lines) on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate is evident compared to the discreteness of the histogram, as kernel density estimates converge faster to the true underlying density for continuous random variables.
- Let (x1, x2, …, xn) be an independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is : [math]\displaystyle{ \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), }[/math] where K(•) is the kernel — a non-negative function that integrates to one and has mean zero — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as . Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
1992
- (Scott, 1992) ⇒ David W. Scott. (1992). “Multivariate Density Estimation: theory, practice, and visualization." Wiley. ISBN:0471547700
- BOOK PREVIEW: Density estimation has long been recognized as an important tool when used with univariate and bivariate data. But the computer revolution of recent years has provided access to data of unprecedented complexity in ever-growing volume. New tools are required to detect and summarize the multivariate structure of these difficult data. Multivariate Density Estimation: Theory, Practice, and Visualization demonstrates that density estimation retains its explicative power even when applied to trivariate and quadrivariate data. By presenting the major ideas in the context of the classical histogram, the text simplifies the understanding of advanced estimators and develops links between the intuitive histogram and other methods that are more statistically efficient. The theoretical results covered are those particularly relevant to application and understanding. The focus is on methodology, new ideas, and practical advice. A hierarchical approach draws attention to the similarities among different estimators. Also, detailed discussions of nonparametric dimension reduction, nonparametric regression, additive modeling, and classification are included. Because visualization is a key element in effective multivariate nonparametric analysis, more than 100 graphic illustrations supplement the numerous problems and examples presented in the text. In addition, sixteen four-color plates help to convey an intuitive feel for both the theory and practice of density estimation in several dimensions. Ideal as an introductory textbook, Multivariate Density Estimation is also an indispensable professional reference for statisticians, biostatisticians, electrical engineers, econometricians, and other scientists involved in data analysis.
1969
- (Epanechnikov, 1969) ⇒ Epanechnikov, V. A. (1969). “Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications", 14(1), 153-158. DOI:10.1137/1114019
- QUOTE: Introduction
Let
- QUOTE: Introduction
- [math]\displaystyle{ X_i= X(x_1^{(i)},x_2^{(i)},\cdots,x_k^{(i)}),\quad i=1,\cdots, n }[/math],
- be a given sample of [math]\displaystyle{ n }[/math] independent realizations of a k-dimensional random variable [math]\displaystyle{ X(x_1^{(i)},x_2^{(i)},\cdots,x_k^{(i)}) }[/math] from a population characterized by a continuous k-variate probability density [math]\displaystyle{ f(x_1,x_2,\cdots,x_k) }[/math]. We define the multivariate empirical probability density [math]\displaystyle{ f_n(x_1,x_2,\cdots,x_k) }[/math] to be the function of sample values [math]\displaystyle{ X_i }[/math] given by
- [math]\displaystyle{ (1) \quad f_n(x_1,x_2,\cdots,x_k)=\frac{1}{n}\sum_{i=1}^n\prod_{i=1}^k\frac{1}{h_\ell(n)}K_\ell\Big(\frac{x_\ell-x_\ell^i}{h_\ell(n)}\Big) }[/math]
- Each “kernel” [math]\displaystyle{ K_\ell(y) }[/math] has the following properties:
- (a) [math]\displaystyle{ 0 \leq K_\ell(y)\lt C \lt \infty }[/math]
- (b) [math]\displaystyle{ K_\ell(y)=K_\ell(-y) }[/math]
- (c) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)dy=1 }[/math]
- (2)
- (d) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)y^2dy=1 }[/math]
- (e) [math]\displaystyle{ \int_{-\infty}^{+\infty} K_\ell(y)y^mdy\lt \infty }[/math] for [math]\displaystyle{ 0\leq m \lt \infty }[/math]
- and the "spreading" coefficients [math]\displaystyle{ h_\ell(n) }[/math] of the kernels depend in general on the sample size [math]\displaystyle{ n }[/math] and tend to zero as [math]\displaystyle{ n\rightarrow \infty }[/math].
- ↑ Named for Epanechnikov, V. A. (1969). “Non-Parametric Estimation of a Multivariate Probability Density". Theory Probab. Appl. 14 (1): 153–158. doi:10.1137/1114019.
- ↑ Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN 0-412-55270-1.
- ↑ Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). “Kernel density estimation via diffusion". Annals of Statistics. 38 (5): 2916–2957. doi:10.1214/10-AOS799.