Texygen Platform

AKA: Texygen.
Context:
- It can be solved by Texygen System that produces textual data from real-word data.
- Benchmark Datasets:
  - a Synthetic Data Training Set: 10,000 Oracle generated sentences (Total number of Words = 5,000; Sentence Length = 50).
  - Real Data Training Set and Test Set: a (half-and-half) selection of 20,000 sentences from the Image COCO Captions.
- Baseline Models: Vanilla MLE, SeqGAN, MaliGAN, RankGAN, GSGAN, TextGAN, and LeakGAN.
- Performance Metrics: BLEU Score, EmbSim Metric, NLL-Oracle, and Self-BLEU.
Example(s):
- Zhu et al. (2018) experiments results:

	BLEU-2	BLEU-3	BLEU-4	BLEU-5
SeqGAN	0.917	0.747	0.530	0.348
MaliGAN	0.887	0.697	0.482	0.312
RankGAN	0.937	0.799	0.601	0.414
LeakGAN	0.926	0.816	0.660	0.470
TextGAN	0.650	0.645	0.569	0.523
MLE	0.921	0.768	0.570	0.392

	BLEU-2	BLEU-3	BLEU-4	BLEU-5
SeqGAN	0.917	0.747	0.530	0.348
MaliGAN	0.887	0.697	0.482	0.312
RankGAN	0.937	0.799	0.601	0.414
LeakGAN	0.926	0.816	0.660	0.470
TextGAN	0.650	0.645	0.569	0.523
MLE	0.921	0.768	0.570	0.392

	BLEU-2	BLEU-3	BLEU-4	BLEU-5
SeqGAN	0.950	0.840	0.670	0.489
MaliGAN	0.918	0.781	0.606	0.437
RankGAN	0.959	0.882	0.762	0.618
LeakGAN	0.966	0.913	0.848	0.780
TexIGAN	0.942	0.931	0.804	0.746
MLE	0.916	0.769	0583	0.408

Counter-Example(s):
See: Text Generation System, Natural Language Generation System, Natural Language Understanding System, Hierarchical Reinforcement Learning System, Language Model.

References

(GitHub, 2010) ⇒ https://github.com/geek-ai/Texygen Retrieved:2020-04-22.
- QUOTE: Texygen is a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented a majority of text generation models, but also covered a set of metrics that evaluate the diversity, the quality and the consistency of the generated texts. The exygen platform could help standardize the research on text generation and facilitate the sharing of fine-tuned open-source implementations among researchers for their work. As a consequence, this would help in improving the reproductivity and reliability of future research work in text generation.