Reading Comprehension Dataset
Jump to navigation
Jump to search
A Reading Comprehension Dataset is a text dataset that can be used for benchmarking Machine Reading Comprehension System.
- AKA: Text Comprehension Dataset.
- Context:
- It ranges from being question-answer dataset to being an machine translation dataset.
- Example(s):
- a BookTest Dataset (Bajgar et al., 2016),
- a Children's Book Test (CBT) Dataset (Hill et al., 2016),
- a CNN-Daily Mail Dataset (Hermann et al.,2015),
- a MS-MARCO Dataset (Nguyen et al., 2016),
- a MC-Test Dataset (Richardson et al., 2013),
- a RACE Dataset (Lai et al., 2017),
- a question-answer dataset such as:
- …
- Counter-Example(s):
- See: Question-Answering System, Question-Answer Dataset, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.
References
2019
- (Reddy et al., 2019) ⇒ Siva Reddy, Danqi Chen, and Christopher D. Manning. (2019). “CoQA: A Conversational Question Answering Challenge.” In: Transactions of the Association for Computational Linguistics Journal, 7. DOI:10.1162/tacl_a_00266.
- QUOTE: We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. (...)
- QUOTE: We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA.
Dataset | Conversational | Answer Type | Domain |
---|---|---|---|
MCTest (Richardson et al., 2013) | ✗ | Multiple choice | Children’s stories |
CNN/Daily Mail (Hermann et al., 2015) | ✗ | Spans | News |
Children's book test (Hill et al., 2016) | ✗ | Multiple choice | Children’s stories |
SQuAD (Rajpurkar et al., 2016) | ✗ | Spans | Wikipedia |
MS MARCO (Nguyen et al., 2016) | ✗ | Free-form text, Unanswerable | Web Search |
NewsQA (Trischler et al., 2017) | ✗ | Spans | News |
SearchQA (Dunn et al., 2017) | ✗ | Spans | Jeopardy |
TriviaQA (Joshi et al., 2017) | ✗ | Spans | Trivia |
RACE (Lai et al., 2017) | ✗ | Multiple choice | Mid/High School Exams |
Narrative QA (Kocisky et al., 2018) | ✗ | Free-form text | Movie Scripts, Literature |
SQuAD 2.0 (Rajpurkar et al., 2018) | ✗ | Spans, Unanswerable | Wikipedia |
CoQA (this work) | ✔ | Free-form text, Unanswerable; Each answer comes with a text span rationale |
Children’s Stories, Literature, Mid/High School Exams, News, Wikipedia, Reddit, Science |
2018a
- (Kocisky et al., 2018) ⇒ Tomas Kocisky, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gabor Melis, and Edward Grefenstette. (2018). “The NarrativeQA Reading Comprehension Challenge.” In: Trans. Assoc. Comput. Linguistics, 6.
- QUOTE: To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. (...)
- QUOTE: To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
Dataset | Documents | Questions | Answers |
---|---|---|---|
MCTest (Richardson et al., 2013) | 660 short stories, grade school level | 2640 human generated, based on the document | multiple choice |
CNN/Daily Mail (Hermann et al., 2015) | 93K+220K news articles | 387K+997K Cloze-form, based on highlights | entities |
Children’s Book Test (CBT) (Hill et al., 2016) | 687K of 20 sentence passages from 108 children’s books | Cloze-form, from the 21st sentence | multiple choice |
BookTest (Bajgar et al., 2016) | 14.2M, similar to CBT | Cloze-form, similar to CBT | multiple choice |
SQuAD (Rajpurkar et al., 2016) | 23K paragraphs from 536 Wikipedia articles | 108K human generated, based on the paragraphs | spans |
NewsQA (Trischler et al., 2016) | 13K news articles from the CNN dataset | 120K human generated, based on headline, highlights | spans |
MS MARCO (Nguyen et al., 2016) | 1M passages from 200K+ documents retrieved using the queries | 100K search queries | human generated, based on the passages |
SearchQA (Dunn et al., 2017) | 6.9m passages retrieved from a search engine using the queries | 140k human generated Jeopardy! questions | human generated Jeopardy! answers |
NarrativeQA (this paper) | 1,572 stories (books, movie scripts) & human generated summaries | 46,765 human generated, based on summaries | human generated, based on summaries |
2018b
- (Rajpurkar et al., 2018) ⇒ Pranav Rajpurkar, Robin Jia, and Percy Liang. (2018). “Know What You Don't Know: Unanswerable Questions for SQuAD". In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Volume 2: Short Papers.
- QUOTE: In this work, we construct SQuADRUn[1], a new dataset that combines the existing questions in SQuAD with 53,775 new, unanswerable questions about the same paragraphs. Crowdworkers crafted these questions so that (1) they are relevant to the paragraph, and (2) the paragraph contains a plausible answer—something of the same type as what the question asks for.
2017a
- (Dunn et al., 2017) ⇒ Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, and Kyunghyun Cho. (2017). “SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine.” In: ePrint: abs/1704.05179.
2017b
- (Joshi et al., 2017) ⇒ Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. (2017). “TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension". In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) Volume 1: Long Papers.
2017c
- (Lai et al., 2017) ⇒ Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard H. Hovy. (2017). “RACE: Large-scale ReAding Comprehension Dataset From Examinations". In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
- QUOTE: We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28, 000 passages and near 100, 000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the student's ability in understanding and reasoning.
2016a
- (Bajgar et al., 2016) ⇒ Ondrej Bajgar, Rudolf Kadlec, and Jan Kleindienst. (2016). “Embracing Data Abundance: BookTest Dataset for Reading Comprehension.” In: ePrint: abs/1610.00956.
2016b
- (Hill et al., 2016) ⇒ Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. (2016). “The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations.” In: Proceedings of the 4th International Conference on Learning Representations (ICLR 2016) Conference Track.
2016c
- (Rajpurkar et al., 2016) ⇒ Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. (2016). “SQuAD: 100,000+ Questions for Machine Comprehension of Text". In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). DOI: 10.18653/v1/D16-1264.
- QUOTE: We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000 + questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees.