MaskGAN Benchmark Task
(Redirected from MaskGAN Benchmark)
Jump to navigation
Jump to search
A MaskGAN Benchmark Task is an Automatic Text Generation Task that uses Generative Adversarial Network and Masked Sequences to generate text items.
- AKA: MarkGAN Task.
- Context:
- Task Input(s): word sequences.
- Task Output(s): automatic generated sentences.
- Task Requirement(s):
- Benchmark Datasets:
- Real Datasets:
- Penn Treebank (PTB) Datasets - a vocabulary of 10,000 unique words. Training set contains 930,000 words; validation set contains 74,000 words; and the test set contains 82,000 words.
- IMDB Movie Datasets - a 100,000 movie reviews datasets: 25,000 labeled training instances; 25,000 labeled test instances and 50,000 unlabeled training instances.
- Synthetic Datasets:
- Conditional Samples generated on the PTB datasets.
- Unconditional Samples generated on the PTB datasets.
- Conditional Samples generated on the IMDB datasets.
- Unconditional Samples generated on the IMDB datasets.
- Real Datasets:
- Benchmark Performance Metrics:
- Validation Perplexity,
- Mode Collapse - percentage of unique n-grams (bi-, tri- and quad-grams) in a set of 10 000 generated IMDB movie reviews.
- Human Evaluation - an Amazon Mechanical Turk blind heads-up comparison between pairs of baseline models trained on IMDB reviews.
- Baseline Models:
- other text generation competing systems:
- Benchmark Datasets:
- It can be solved by a MaskGAN Training System that implements MaskGAN Algorithms.
- Example(s):
- Validation perplexity (Tab.5 in Fedus et al., 2018):
Model | Perplexity of IMDB samples under a pretrained LM |
---|---|
MaskMLE | $273.1 \pm 3.5$ |
MaskGAN | $108.3 \pm 3.5$ |
- Mode Collapse (Tab.6 in Fedus et al., 2018):
Model | Unique bigrams | % Unique trigrams % | Unique quadgrams |
---|---|---|---|
LM | 40.6 | 75.2 | 91.9 |
MaskMLE | 43.6 | 77.4 | 92.6 |
MaskGAN | 38.2 | 70.7 | 88.2 |
- Human evaluation of the baseline models trained on IMDB datasets (Tab.7 in Fedus et al., 2018):
Preferred Model | Grammaticality % | Topicality % | Overall % |
---|---|---|---|
LM | 15.3 | 19.7 | 15.7 |
MaskGAN | 59.7 | 58.3 | 58.0 |
LM | 20.0 | 28.3 | 21.7 |
MaskMLE | 42.7 | 43.7 | 40.3 |
MaskGAN | 49.7 | 43.7 | 44.3 |
MaskMLE | 18.7 | 20.3 | 18.3 |
Real samples | 78.3 | 72.0 | 73.3 |
LM | 6.7 | 7.0 | 6.3 |
Real samples | 65.7 | 59.3 | 62.3 |
MaskGAN | 18.0 | 20.0 | 16.7 |
- Human evaluation of the baseline models trained on PTB datasets (Tab.8 in Fedus et al., 2018):
Preferred Model | Grammaticality % | Topicality % | Overall % |
---|---|---|---|
LM | 32.0 | 30.7 | 27.3 |
MaskGAN | 41.0 | 39.0 | 35.3 |
LM | 32.7 | 34.7 | 32.0 |
MaskMLE | 37.3 | 33.3 | 31.3 |
MaskGAN | 44.7 | 33.3 | 35.0 |
MaskMLE | 28.0 | 28.3 | 26.3 |
SeqGAN | 38.7 | 34.0 | 30.7 |
MaskMLE | 33.3 | 28.3 | 27.3 |
SeqGAN | 31.7 | 34.7 | 32.0 |
MaskGAN | 43.3 | 37.3 | 37.0 |
- Counter-Example(s):
- See: Neural Text Generation System, Seq2Seq Model, Neural Autoregressice Model, Professor Forcing Algorithm, Scheduled Sampling Algorithm.
References
2018
- (Fedus et al., 2018) ⇒ William Fedus, Ian Goodfellow, and Andrew M Dai. (2018). “MaskGAN: Better Text Generation via Filling in the ________". In: Proceedings of the Sixth International Conference on Learning Representations (ICLR-2018).
2015a
- (Kingma & Ba, 2015) ⇒ Diederik P. Kingma, and Jimmy Ba. (2015). “Adam: A Method for Stochastic Optimization.” In: Proceedings of the 3rd International Conference for Learning Representations (ICLR-2015).
2015b
- (Luong, Pham et al., 2015) ⇒ Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. (2015). “Effective Approaches to Attention-based Neural Machine Translation". In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2015). DOI:10.18653/v1/D15-1166.
1999
- (Sutton et al., 1999) ⇒ Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour (1999). "Policy Gradient Methods for Reinforcement Learning with Function Approximation". In: Advances in Neural Information Processing Systems 12 (NIPS Conference).