Maximum Mean Discrepancy (MMD)

A Maximum Mean Discrepancy (MMD) is a statistical distance metric that quantifies the difference between two probability distributions based on the distance between their mean embeddings in a reproducing kernel Hilbert space (RKHS).

AKA: Kernel Mean Discrepancy, RKHS Distance Metric.
Context:
- It can be used to measure embedded distribution difference between the source domain with sufficient but finite labeled data and the target domain with sufficient unlabeled data.
- It can be used to compare distributions without requiring density estimation, making it suitable for high-dimensional data.
- It can be applied in two-sample hypothesis testing to determine if two datasets are from the same distribution.
- It can serve as a loss function in training generative models like GANs to encourage generated data to match the target distribution.
- It can be utilized in domain adaptation to align feature distributions between source and target domains.
- It can be estimated empirically using kernel functions such as Gaussian or polynomial kernels.
- It can be integrated into machine learning pipelines for tasks like anomaly detection, transfer learning, and model evaluation.
- It can be extended to measure discrepancies in structured data, including time series and graphs.
- ...
Example(s):
- Using MMD to evaluate the performance of a GAN by measuring the similarity between generated and real data distributions.
- Applying MMD in domain adaptation to reduce the distribution gap between labeled source data and unlabeled target data.
- Employing MMD in two-sample tests to detect changes in data distributions over time.
- ...
Counter-Example(s):
- Kullback-Leibler Divergence, which requires density estimation and may be undefined if distributions have non-overlapping support.
- Wasserstein Distance, which considers the cost of transporting mass between distributions and may be computationally intensive.
- Euclidean Distance, which measures point-wise differences and does not capture distributional discrepancies.
- ...
See: Kernel Methods, Kernel Hilbert Space, Generative Adversarial Networks, Domain Adaptation, Two-Sample Hypothesis Testing, Reproducing Kernel Hilbert Space.

References

2025

(Wikipedia, 2025) ⇒ "Kernel embedding of distributions". In: Wikipedia. Retrieved:2025-05-25.
- QUOTE: The kernel embedding of distributions (also called the kernel mean or mean map) is a nonparametric method representing a probability distribution as an element of a reproducing kernel Hilbert space (RKHS). This framework enables comparison and manipulation of distributions using Hilbert space operations, such as inner products and distances, and can preserve all statistical features of arbitrary distributions if a characteristic kernel is used. The maximum mean discrepancy (MMD) is a distance measure between distributions defined as the distance between their RKHS embeddings, and is widely used for two-sample tests and domain adaptation.

2023

(Shnarch et al., 2023) ⇒ Eyal Shnarch, Ariel Gera, Alon Halfon, Lena Dankin, Leshem Choshen, Ranit Aharonov, & Noam Slonim. (2023). "Cluster & Tune: Boost Cold Start Performance in Text Classification".
- QUOTE: Cluster & Tune introduces an intermediate unsupervised clustering step between pretraining and fine-tuning of pretrained language models to address the cold start problem in text classification. Clustering unlabeled data and using cluster assignments as pseudo-labels for intermediate training significantly improves performance when labeled data is scarce.

2022

(Machine Learning Note, 2022) ⇒ Machine Learning Note. (2022). "Maximum Mean Discrepancy (MMD)".
- QUOTE: Maximum mean discrepancy (MMD) is a statistical test for measuring the difference between two distributions based on their embeddings in a reproducing kernel Hilbert space. MMD is used for two-sample testing, domain adaptation, and generative model evaluation.

2019

(Tunali, 2019) ⇒ Onur Tunali. (2019). "Maximum Mean Discrepancy in Machine Learning".
- QUOTE: MMD computes the distance between the means of two distributions in a kernel-induced feature space. It is a nonparametric method that does not require density estimation and is widely used for domain adaptation and distribution comparison in machine learning.

2009

(Chen et al., 2009) ⇒ Bo Chen, Wai Lam, Ivor Tsang, and Tak-Lam Wong. (2009). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557045
- … Maximum Mean Discrepancy (MMD) [5] is adopted to measure the embedded distribution difference between the source domain with sufficient but finite labeled data and the target domain with sufficient unlabeled data.

2007

(Gretton et al., 2007) ⇒ A. Gretton, K. Borgwardt, M. Rasch, Bernhard Schölkopf, and Alexander J. Smola. (2007). “A Kernel Method for the Two-Sample Problem.” In: Advances in Neural Information Processing Systems, 19.
- … We call this statistic the Maximum Mean Discrepancy (MMD). ...

2006

(Borgwardt et al., 2006) ⇒ Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J. Smola. (2006). “Integrating structured biological data by kernel maximum mean discrepancy." Bioinformatics, 22(14).
- http://alex.smola.org/teaching/iconip2006/iconip_3.pdf

Maximum Mean Discrepancy (MMD)

References

2025

2023

2022

2019

2009

2007

2006

Navigation menu

Search