CNN-Daily Mail Dataset

AKA: CNN-Daily Mail Corpus.
Context:
- It was developed by (Hermann et al. ,2015).
- …
Example(s):
- DeepMind CNN-DailyMail Dataset,
- …
Counter-Example(s):
- a BookTest Dataset,
- a Children's Book Test (CBT) Dataset,
- a MS MARCO Dataset,
- a MC-Test Dataset,
- a NewsQA Dataset,
- a RACE Dataset.
- an Image Description Dataset.
- an MLSUM Corpus.
See: Question-Answering System, Question-Answer Dataset, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.

References

GBard
- The CNN/Daily Mail Corpus is a large text dataset for text summarization. It was created by researchers at the University of Washington and Allen Institute for Artificial Intelligence in 2017. The corpus contains 286,817 training pairs, 13,368 validation pairs, and 11,487 test pairs. Each pair consists of a news article and a corresponding abstractive summary of the article.
  The CNN/Daily Mail Corpus is one of the most popular datasets for text summarization research. It is used to train and evaluate a wide variety of text summarization models, including both extractive and abstractive models. Extractive models select sentences from the original article to create a summary, while abstractive models generate new sentences to create a summary.
  The CNN/Daily Mail Corpus is also used for other natural language processing tasks, such as machine reading comprehension and question answering.

Nallapati, Ramesh, Bowen Zhou, Caglar Gulcehre, and Bing Xiang. “Abstractive text summarization using sequence-to-sequence rnns and beyond." arXiv preprint arXiv:1602.06023 (2016).