Mahalanobis Distance Measure
A Mahalanobis Distance Measure is a unitless distance measure between a point P and a distribution D.
- Context:
- It can be a multi-dimensional generalization measuring how many standard deviations away P is from the mean of D.
- …
- Counter-Example(s):
- See: Principal Component, Scale Invariance, Correlations, Distance Function, L1 Distance, Covariance Matrix, Bhattacharyya Distance.
References
2016
- (Wikipedia, 2016) ⇒ http://wikipedia.org/wiki/Mahalanobis_distance Retrieved:2016-2-7.
- The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936.
It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D.
This distance is zero if P is at the mean of D, and grows as P moves away from the mean: along each principal component axis, it measures the number of standard deviations from P to the mean of D. If each of these axes is rescaled to have unit variance, then Mahalanobis distance corresponds to standard Euclidean distance in the transformed space.
Mahalanobis distance is thus unitless and scale-invariant, and takes into account the correlations of the data set.
- The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936.
2012
- http://en.wikipedia.org/wiki/Mahalanobis_distance#Definition
- QUOTE: Formally, the Mahalanobis distance of a multivariate vector [math]\displaystyle{ x = (x_1, x_2, x_3, \dots, x_N )^T }[/math] from a group of values with mean [math]\displaystyle{ \mu = (\mu_1, \mu_2, \mu_3, \dots , \mu_N )^T }[/math] and covariance matrix [math]\displaystyle{ S }[/math] is defined as: :[math]\displaystyle{ D_M(x) = \sqrt{(x - \mu)^T S^{-1} (x-\mu)}.\, }[/math][1]
Mahalanobis distance (or "generalized squared interpoint distance" for its squared value[2]) can also be defined as a dissimilarity measure between two random vectors [math]\displaystyle{ \vec{x} }[/math] and [math]\displaystyle{ \vec{y} }[/math] of the same distribution with the covariance matrix [math]\displaystyle{ S }[/math]: :[math]\displaystyle{ d(\vec{x},\vec{y})=\sqrt{(\vec{x}-\vec{y})^T S^{-1} (\vec{x}-\vec{y})}.\, }[/math]
If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance measure is called the normalized Euclidean distance: :[math]\displaystyle{ d(\vec{x},\vec{y})= \sqrt{\sum_{i=1}^N {(x_i - y_i)^2 \over s_{i}^2}}, }[/math]
where [math]\displaystyle{ s_{i} }[/math] is the standard deviation of the [math]\displaystyle{ x_i }[/math] and [math]\displaystyle{ y_i }[/math] over the sample set.
- QUOTE: Formally, the Mahalanobis distance of a multivariate vector [math]\displaystyle{ x = (x_1, x_2, x_3, \dots, x_N )^T }[/math] from a group of values with mean [math]\displaystyle{ \mu = (\mu_1, \mu_2, \mu_3, \dots , \mu_N )^T }[/math] and covariance matrix [math]\displaystyle{ S }[/math] is defined as: :[math]\displaystyle{ D_M(x) = \sqrt{(x - \mu)^T S^{-1} (x-\mu)}.\, }[/math][1]
- ↑ De Maesschalck, Roy; Jouan-Rimbaud, Delphine; and Massart, Désiré L. (2000); The Mahalanobis distance, Chemometrics and Intelligent Laboratory Systems 50:1–18
- ↑ Gnanadesikan, Ramanathan; and Kettenring, John R. (1972); Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics 28:81-124
2008
- (Xiang et al., 2008) ⇒ Shiming Xiang, Feiping Nie, and Changshui Zhang. (2008). “Learning a Mahalanobis Distance Metric for Data Clustering and Classification.” In: Pattern Recognition 41.