BookTest Dataset

From GM-RKB
Jump to navigation Jump to search

A BookTest Dataset is a reading comprehension dataset that is similar to Children's Book Test (CBT) dataset but 60 times larger.



References

2016

  CNN Daily Mail CBT CN CBT NE BookTest
# queries 380,298 879,450 120,769 108,719 14,140,825
Max # options 527 371 10 10 10
Avg # options 26.4 26.5 10 10 10
Avg # tokens 762 813 470 433 522
Vocab. size 118,497 208,045 53,185 53,063 1,860,394
Table 1: Statistics on the 4 standard text comprehension datasets and our new BookTest dataset introduced in this paper. CBT CN stands for CBT Common Nouns and CBT NE stands for CBT Named Entites. Statistics were taken from (Hermann et al., 2015) and the statistics provided with the CBT data set.