ReAding Comprehension from Examinations (RACE) Dataset

A ReAding Comprehension from Examinations (RACE) Dataset is a reading comprehension dataset.

Context:
- It can be available online at: http://www.cs.cmu.edu/~glai1/data/race/.
- …
Example(s):
- that consists of near 28,000 passages and near 100,000 questions generated by human experts.
- …
Counter-Example(s):
- a CoQA Dataset,
- an ImageNet Dataset,
- a MS COCO Dataset,
- a SQuAD Dataset.
See: Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.

References

2018

(Lai, 2018) ⇒ http://www.cs.cmu.edu/~glai1/data/race/
- QUOTE: Each passage is a JSON file. The JSON file contains the following fields:

   article: A string, which is the passage.
   questions: A string list. Each string is a query. We have two types of questions. First one is an interrogative sentence. Another one has a placeholder, which is represented by _.
   options: A list of the options list. Each options list contains 4 strings, which are the candidate option.
   answers: A list contains the golden label of each query.
   id: Each passage has a unique id in this dataset.

2017

(Lai et al., 2017) ⇒ Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard H. Hovy. (2017). “RACE: Large-scale ReAding Comprehension Dataset From Examinations". In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
- QUOTE: We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28, 000 passages and near 100, 000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the student's ability in understanding and reasoning.