BookTest Dataset
Jump to navigation
Jump to search
A BookTest Dataset is a reading comprehension dataset that is similar to Children's Book Test (CBT) dataset but 60 times larger.
- Context:
- Datasets available at: https://ibm.biz/booktest-v1
- Example(s):
- …
- Counter-Example(s):
- See: Reading Comprehension Task, Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.
References
2016
- (Bajgar et al., 2016) ⇒ Ondrej Bajgar, Rudolf Kadlec, and Jan Kleindienst. (2016). “Embracing Data Abundance: BookTest Dataset for Reading Comprehension.” In: ePrint: abs/1610.00956.
- QUOTE: Similarly to the CBT, our BookTest dataset[1] is derived from books available through project Gutenberg. We used 3,555 copyright-free books to extract CN examples and 10,507 books for NE examples, for comparison the CBT dataset was extracted from just 108 books.
- QUOTE: Similarly to the CBT, our BookTest dataset[1] is derived from books available through project Gutenberg. We used 3,555 copyright-free books to extract CN examples and 10,507 books for NE examples, for comparison the CBT dataset was extracted from just 108 books.
CNN | Daily Mail | CBT CN | CBT NE | BookTest | |
---|---|---|---|---|---|
# queries | 380,298 | 879,450 | 120,769 | 108,719 | 14,140,825 |
Max # options | 527 | 371 | 10 | 10 | 10 |
Avg # options | 26.4 | 26.5 | 10 | 10 | 10 |
Avg # tokens | 762 | 813 | 470 | 433 | 522 |
Vocab. size | 118,497 | 208,045 | 53,185 | 53,063 | 1,860,394 |
- ↑ BookTest dataset can be downloaded from https://ibm.biz/booktest-v1.