Reading Comprehension Dataset

AKA: Text Comprehension Dataset.
Context:
- It ranges from being question-answer dataset to being an machine translation dataset.
Example(s):
- a BookTest Dataset (Bajgar et al., 2016),
- a Children's Book Test (CBT) Dataset (Hill et al., 2016),
- a CNN-Daily Mail Dataset (Hermann et al.,2015),
- a MS-MARCO Dataset (Nguyen et al., 2016),
- a MC-Test Dataset (Richardson et al., 2013),
- a RACE Dataset (Lai et al., 2017),
- a question-answer dataset such as:
  - a CoQA Dataset (Reddy et al., 2019),
  - a NarrativeQA Dataset (Kocisky et al., 2018),
  - a NewsQA Dataset (Trischler et al., 2016),
  - a SearchQA Dataset (Dunn et al., 2017),
  - a SQuAD Dataset (Rajpurkar et al., 2016; 2018),
  - a TriviaQA Dataset (Joshi et al., 2017).
- …
Counter-Example(s):
- an Image Description Dataset.
See: Question-Answering System, Question-Answer Dataset, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.

References

(Reddy et al., 2019) ⇒ Siva Reddy, Danqi Chen, and Christopher D. Manning. (2019). “CoQA: A Conversational Question Answering Challenge.” In: Transactions of the Association for Computational Linguistics Journal, 7. DOI:10.1162/tacl_a_00266.
- QUOTE: We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA.
  (...)

**Table 1:** Comparison of CoQA with existing reading comprehension datasets.
Dataset	Conversational	Answer Type	Domain
MCTest (Richardson et al., 2013)	✗	Multiple choice	Children’s stories
CNN/Daily Mail (Hermann et al., 2015)	✗	Spans	News
Children's book test (Hill et al., 2016)	✗	Multiple choice	Children’s stories
SQuAD (Rajpurkar et al., 2016)	✗	Spans	Wikipedia
MS MARCO (Nguyen et al., 2016)	✗	Free-form text, Unanswerable	Web Search
NewsQA (Trischler et al., 2017)	✗	Spans	News
SearchQA (Dunn et al., 2017)	✗	Spans	Jeopardy
TriviaQA (Joshi et al., 2017)	✗	Spans	Trivia
RACE (Lai et al., 2017)	✗	Multiple choice	Mid/High School Exams
Narrative QA (Kocisky et al., 2018)	✗	Free-form text	Movie Scripts, Literature
SQuAD 2.0 (Rajpurkar et al., 2018)	✗	Spans, Unanswerable	Wikipedia
CoQA (this work)	✔	Free-form text, Unanswerable; Each answer comes with a text span rationale	Children’s Stories, Literature, Mid/High School Exams, News, Wikipedia, Reddit, Science

(Kocisky et al., 2018) ⇒ Tomas Kocisky, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gabor Melis, and Edward Grefenstette. (2018). “The NarrativeQA Reading Comprehension Challenge.” In: Trans. Assoc. Comput. Linguistics, 6.
- QUOTE: To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
  (...)

**Table 1:** Comparison of datasets.
Dataset	Documents	Questions	Answers
MCTest (Richardson et al., 2013)	660 short stories, grade school level	2640 human generated, based on the document	multiple choice
CNN/Daily Mail (Hermann et al., 2015)	93K+220K news articles	387K+997K Cloze-form, based on highlights	entities
Children’s Book Test (CBT) (Hill et al., 2016)	687K of 20 sentence passages from 108 children’s books	Cloze-form, from the 21st sentence	multiple choice
BookTest (Bajgar et al., 2016)	14.2M, similar to CBT	Cloze-form, similar to CBT	multiple choice
SQuAD (Rajpurkar et al., 2016)	23K paragraphs from 536 Wikipedia articles	108K human generated, based on the paragraphs	spans
NewsQA (Trischler et al., 2016)	13K news articles from the CNN dataset	120K human generated, based on headline, highlights	spans
MS MARCO (Nguyen et al., 2016)	1M passages from 200K+ documents retrieved using the queries	100K search queries	human generated, based on the passages
SearchQA (Dunn et al., 2017)	6.9m passages retrieved from a search engine using the queries	140k human generated Jeopardy! questions	human generated Jeopardy! answers
NarrativeQA (this paper)	1,572 stories (books, movie scripts) & human generated summaries	46,765 human generated, based on summaries	human generated, based on summaries