MAUVE Score Metric

A MAUVE Score Metric is a text similarity metric that embeds and quantizes human and machine text samples, then calculates divergence between distributions.

AKA: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers.
Context:
- It can sample text from both human and machine-generated sources to analyze the quality and diversity of the generated content.
- It can embed these text samples into a high-dimensional space using a pre-trained language model, such as GPT-2.
- It can quantize the embedded samples into a lower-dimensional, discrete space to make the comparison computationally feasible.
- It can calculate the divergence between the quantized distributions of human and machine text, represented through a divergence curve.
- It can summarize this divergence with the MAUVE Score, which quantifies the area under the divergence curve, providing a comprehensive measure of both the quality (how realistic the text is) and diversity (variety in the text) of the generated content relative to human text.
- It requires careful setup, including the selection of hyperparameters such as the number of samples, the number of clusters for quantization, and the scaling parameter that influences the absolute value of the MAUVE score.
- It involves a computational process that includes embedding text samples, quantization, and then divergence calculation, which can be executed more efficiently with proper batch processing and GPU utilization.
- ...
Example(s):
- as proposed in (Pillutla et al., 2023).
- as implemented in https://github.com/krishnap25/mauve.
- ...
Counter-Example(s):
- BLEU Score Metric.
- ROUGE Score Metric.
See: Text Generation Model, Natural Language Processing, Generative Text Model, Divergence Measure.

References

2024

GPT-4
- The MAUVE Score is derived from a computational framework that implements the MAUVE Algorithm. This algorithm is specifically designed to measure the similarity between distributions of machine-generated text and human text, encapsulating both the quality and diversity of the generated content. The "MAUVE" in MAUVE Score stands for "Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers," which highlights its purpose and methodological approach.
- The MAUVE Algorithm operates by:
  1. Sampling text from both human and machine-generated sources.
  2. Embedding these text samples into a high-dimensional space using a pre-trained language model (such as GPT-2).
  3. Quantizing the embedded samples into a lower-dimensional, discrete space to make the comparison computationally feasible.
  4. Calculating the divergence between the quantized distributions of human and machine text, represented through a divergence curve.
  5. Summarizing this divergence with the MAUVE Score, which essentially quantifies the area under the divergence curve, providing a comprehensive measure of both the quality (how realistic the text is) and diversity (variety in the text) of the generated content relative to human text.

2023

(Pillutla et al., 2023) ⇒ Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, and Zaid Harchaoui. (2023). “MAUVE Scores for Generative Models: Theory and Practice.” In: Journal of Machine Learning Research, 24(356).
- NOTE: This work introduces the MAUVE Algorithm, a computational framework for evaluating the similarity between machine-generated and human-written text distributions. It emphasizes the algorithm's role in quantifying the quality and diversity of generated content and its application in comparing different generative models' performance in producing human-like text.

MAUVE Score Metric

References

2024

2023

Navigation menu

Search