2023 MAUVEScoresforGenerativeModelsT
- (Pillutla et al., 2023) ⇒ Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, and Zaid Harchaoui. (2023). “MAUVE Scores for Generative Models: Theory and Practice.” In: Journal of Machine Learning Research, 24(356).
Subject Headings: MAUVE Score.
Notes
- The original paper of this journal paper is:
- (Pillutla et al., 2021) ⇒ Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. (2021). “MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers.” Advances in Neural Information Processing Systems, 34.
- It introduces MAUVE Scores for evaluating Generative Models against target distributions in Text Generation and Image Generation.
- It explores three statistical estimation methods: Vector Quantization, Nearest Neighbor Search, and Classifier-Based Estimation.
- It provides statistical error bounds for Vector Quantization, addressing both statistical and Quantization Errors.
- It demonstrates MAUVE Score' correlation with Human Judgments and their ability to quantify known properties of generated texts and images.
- It compares MAUVE Score across various f-Divergences, showing flexibility and effectiveness in Generative Model Evaluation.
- It investigates the impact of Neural Embeddings on the evaluation, finding that the choice of embedding significantly affects MAUVE Scores.
- It extends the application of MAUVE Scores beyond text to Image Generation, showing it can recover expected trends and correlate with established Evaluation Metrics.
Cited By
Quotes
Abstract
Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of f-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 MAUVEScoresforGenerativeModelsT | Yejin Choi Sean Welleck Swabha Swayamdipta Krishna Pillutla Lang Liu John Thickstun Rowan Zellers Sewoong Oh Zaid Harchaoui | MAUVE Scores for Generative Models: Theory and Practice | 2023 |