Kernel Function
Jump to navigation
Jump to search
A Kernel Function is a distance function that evaluates the similarity between two structures (in some metric space).
- Context:
- It can be used by Supervised Learning Algorithms on Nonlinear Relationships because it can tranform the Input Space to a new (higher-dimensional) Feature Space in which a Linear Relationship exists.
- It can be a Linear Kernel Function, a Polynomian Kernel Function, a Gaussian Kernel Function, and others.
- It can be an Iterating Kernel Function (Courant and Hilbert, 1993).
- It can be used by a Kernel-based Predictive Classifier and a Kernel-based Supervised Learning Algorithm.
- It can represent a 'shortcut' in that the feature map in a high dimensional feature space does not need to be calculated.
- Its Computational Complexity is not necessarily proportional to the number of Features.
- It can incorporate Prior Knowledge of the problem domain.
- It contains all of the information about the relative positions of the inputs in the Feature Space.
- It can be, depending on the function input: a Tuple Kernel, a Vector Kernel, a Tree Kernel, a Graph Kernel, ...
- It can have Closure Properties
- K(x,z) = K1(x,z) + c
- K(x,z) = c*K1(x,z)
- K(x,z) = K1(x,z) * K2(x,z)
- K(x,z) = K1(x,z) + K2(x,z)
- Example(s):
- Uniform Kernel [math]\displaystyle{ \frac{1}{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triangle Kernel [math]\displaystyle{ (1-\vert u \vert) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Linear Kernel Function.
- K(x,z) = [math]\displaystyle{ (\lt x,z\gt +1)^d }[/math] (polynomial kernel function of degree [math]\displaystyle{ d }[/math])
- K(x,z) = [math]\displaystyle{ \exp(-||x-z||^2/2\sigma^2) }[/math] (Gaussian RBF)
- K(x,z) = tanh(<x,z>-a) (two-layer sigmoidal neural network)
- Positive-Definite Kernel Function.
- Radial Basis Kernel Function.
- Fisher Kernel.
- Epanechnikov Kernel [math]\displaystyle{ \frac{3}{4}(1-u^{2}) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Quartic Kerne Functionl [math]\displaystyle{ \frac{15}{16}(1-u^{2})^{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triweight Kernel Function [math]\displaystyle{ \frac{35}{32}(1-u^{2})^{3} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Gaussian Kernel Function [math]\displaystyle{ \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
- Cosinus Kernel Function [math]\displaystyle{ \frac{\pi}{4}\cos(\frac{\pi}{2}u) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Spectral-Mixture Kernel Function.
- …
- Counter-Example(s):
- …
- See: Relational Data Kernel Function, Kernel-based Learning Algorithm, Canonical Dot Product, Positive Definite Function, Kernel Space, Kernel Density Estimator, Convolutional Kernel Function.
References
- tbd
- If X is the instance space, a kernel function is a mapping K:X
x
X->[0,infinity) such that given two instances [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math], K(x,y) = SUM(i) ti(x) ti(y) = t(x)·t(y), where ti(x) is some feature function over the instance x.
- If X is the instance space, a kernel function is a mapping K:X
2008
- http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html
- QUOTE: Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
Table 6.1: Kernel functions.
- Kernel [math]\displaystyle{ K(u) }[/math]
- Uniform [math]\displaystyle{ \frac{1}{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triangle [math]\displaystyle{ (1-\vert u \vert) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Epanechnikov [math]\displaystyle{ \frac{3}{4}(1-u^{2}) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Quartic [math]\displaystyle{ \frac{15}{16}(1-u^{2})^{2} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Triweight [math]\displaystyle{ \frac{35}{32}(1-u^{2})^{3} {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- Gaussian [math]\displaystyle{ \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) }[/math]
- Cosinus [math]\displaystyle{ \frac{\pi}{4}\cos(\frac{\pi}{2}u) {\boldsymbol{I}}(\vert u \vert \le 1) }[/math]
- QUOTE: Assume we have [math]\displaystyle{ n }[/math] independent observations [math]\displaystyle{ x_1,\ldots,x_n }[/math] from the random variable [math]\displaystyle{ X }[/math]. The kernel density estimator [math]\displaystyle{ \widehat{f}_{h}(x) }[/math] for the estimation of the density value [math]\displaystyle{ f(x) }[/math] at point [math]\displaystyle{ x }[/math] is defined as : [math]\displaystyle{ \displaystyle \widehat{f}_{h}(x)= \frac{1}{nh}\sum_{i=1}^{n} K\left(\frac{x_i-x}{h}\right),\ 6.1) }[/math] [math]\displaystyle{ K(\bullet) }[/math] denoting a so-called kernel function, and $ h</math> denoting the bandwidth. A number of possible kernel functions is listed in the following table.
2004
- (Lanckriet et al., 2004a) ⇒ Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. (2004). “Learning the Kernel Matrix with Semidefinite Programming.” In: The Journal of Machine Learning Research, 5.
- QUOTE: Kernel-based learning algorithms (see, for example, Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Shawe-Taylor and Cristianini, 2004) work by embedding the data into a Hilbert space, and searching for linear relations in such a space. The embedding is performed implicitly, by specifying the inner product between each pair of points rather than by giving their coordinates explicitly. This approach has several advantages, the most important deriving from the fact that the inner product in the embedding space can often be computed much more easily than the coordinates of the points themselves. …
… Definition 1: A kernel is a function [math]\displaystyle{ k }[/math], such that [math]\displaystyle{ k(\mathbf{x}, \mathbf{z}) = \lt \Phi(\mathbf{x}); \Phi(\mathbf{z})\gt }[/math] for all x; z 2 X, where ff is a mapping from X to an (inner product) feature space F. …
- QUOTE: Kernel-based learning algorithms (see, for example, Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Shawe-Taylor and Cristianini, 2004) work by embedding the data into a Hilbert space, and searching for linear relations in such a space. The embedding is performed implicitly, by specifying the inner product between each pair of points rather than by giving their coordinates explicitly. This approach has several advantages, the most important deriving from the fact that the inner product in the embedding space can often be computed much more easily than the coordinates of the points themselves. …
1997
- (Mitchell, 1997) ⇒ Tom M. Mitchell. (1997). “Machine Learning." McGraw-Hill.
- Much of the literature on nearest-neighbor methods and weighted local regression uses a terminology that has arisen from the field of statistical pattern recognition....
- Regression means approximating a real-valued target function.
- Residual is the error f^(x) - [math]\displaystyle{ f }[/math](x) in approximating the target function.
- Kernel function is the function of distance that is used to determine the weight of each training example. In other words, the kernel function is the function [math]\displaystyle{ K }[/math] such that wi = K(d(xi, xq)).
- Much of the literature on nearest-neighbor methods and weighted local regression uses a terminology that has arisen from the field of statistical pattern recognition....