Stanford Question Answering (SQuAD) Dataset
Jump to navigation
Jump to search
A Stanford Question Answering (SQuAD) Dataset is a QA dataset that consists of questions posed by crowdworkers on the Wikipedia articles set.
- AKA: SQuAD Dataset.
- Context:
- Datasets available online at : https://rajpurkar.github.io/SQuAD-explorer/
- Example(s):
- Counter-Example(s):
- a CoQA Dataset,
- a MS COCO Dataset,
- a NarrativeQA Dataset,
- a NewsQA Dataset,
- a RACE Dataset,
- a SearchQA Dataset.
- See: Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.
References
2018
- (Rajpurkar et al., 2018) ⇒ Pranav Rajpurkar, Robin Jia, and Percy Liang. (2018). “Know What You Don't Know: Unanswerable Questions for SQuAD". In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Volume 2: Short Papers.
- QUOTE: In this work, we construct SQuADRUn[1], a new dataset that combines the existing questions in SQuAD with 53,775 new, unanswerable questions about the same paragraphs. Crowdworkers crafted these questions so that (1) they are relevant to the paragraph, and (2) the paragraph contains a plausible answer—something of the same type as what the question asks for. Two such examples are shown in Figure 1.
- QUOTE: In this work, we construct SQuADRUn[1], a new dataset that combines the existing questions in SQuAD with 53,775 new, unanswerable questions about the same paragraphs. Crowdworkers crafted these questions so that (1) they are relevant to the paragraph, and (2) the paragraph contains a plausible answer—something of the same type as what the question asks for. Two such examples are shown in Figure 1.
Article: Endangered Species Act Paragraph: “ . . . Other legislation followed, including the Migratory Bird Conservation Act of 1929, a 1937 treaty prohibiting the hunting of right and gray whales, and the Bald Eagle Protection Act of 1940. These later laws had a low cost to society—the species were relatively rare—and little opposition was raised.” |
Question 1: “Which laws faced significant opposition?” Plausible Answer: later laws |
Question 2: “What was the name of the 1937 treaty?” Plausible Answer: Bald Eagle Protection Act |
2017
- (Seo et al., 2017) ⇒ Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. (2017). “Bidirectional Attention Flow for Machine Comprehension.” In: Proceedings of ICLR 2017.
- QUOTE: ... SQuAD is a machine comprehension dataset on a large set of Wikipedia articles, with more than 100,000 questions. The answer to each question is always a span in the context. The model is given a credit if its answer matches one of the human written answers. Two metrics are used to evaluate models: Exact Match (EM) and a softer metric, F1 score, which measures the weighted average of the precision and recall rate at character level. The dataset consists of 90k/10k train/dev question-context tuples with a large hidden test set. It is one of the largest available MC datasets with human-written questions and serves as a great test bed for our model. ...
2016
- https://rajpurkar.github.io/SQuAD-explorer/
- QUOTE: Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.
2016
- (Rajpurkar et al., 2016) ⇒ Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. (2016). “SQuAD: 100,000+ Questions for Machine Comprehension of Text.” In: arXiv preprint arXiv:1606.05250.
- QUOTE: We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000 + questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at this https URL. http://stanford-qa.com