Natural Questions Dataset
Jump to navigation
Jump to search
A Natural Questions Dataset is a QA dataset that is a large-scale end-to-end training data for evaluating QA systems.
- AKA: Natural Questions Corpus, Kwiatkowski's Natural Questions.
- Context:
- It contains questions that consist of "real anonymized, aggregated queries issued to the Google search engine".
- It contains "307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and 7,842 5-way annotated items sequestered as test data".
- Online repository and datasets available at: https://ai.google.com/research/NaturalQuestions
- Benchmark Tasks: Natural Questions Benchmark, Open-domain question answering Benchmark, GPT-2 Benchmark Task.
- Example(s):
- Counter-Example(s):
- a CoQA Dataset,
- a HotpotQA Dataset,
- a MS COCO Dataset,
- a NarrativeQA Dataset,
- a NewsQA Dataset,
- a QuAC Dataset,
- a RACE Dataset,
- a SearchQA Dataset,
- a SQuAD Dataset,
- a TriviaQA Dataset,
- a WikiQA Dataset.
- See: Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.
References
2021
- (Google AI) ⇒ https://ai.google.com/research/NaturalQuestions Retrieved:2021-01-03.
- QUOTE: To help spur development in open-domain question answering, we have created the Natural Questions (NQ) corpus, along with a challenge website based on this data. The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets.
2019
- (Kwiatkowski et al., 2019) ⇒ Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur P. Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. (2019). “Natural Questions: A Benchmark for Question Answering Research.” In: Transactions of the Association for Computational Linguistics, 7.
- QUOTE: The questions consist of real anonymized, aggregated queries issued to the Google search engine. Simple heuristics are used to filter questions from the query stream. Thus the questions are “natural” in that they represent real queries from people seeking information.
(...)
The public release contains 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and 7,842 5-way annotated items sequestered as test data.
- QUOTE: The questions consist of real anonymized, aggregated queries issued to the Google search engine. Simple heuristics are used to filter questions from the query stream. Thus the questions are “natural” in that they represent real queries from people seeking information.