Natural Questions Dataset

AKA: Natural Questions Corpus, Kwiatkowski's Natural Questions.
Context:
- It contains questions that consist of "real anonymized, aggregated queries issued to the Google search engine".
- It contains "307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and 7,842 5-way annotated items sequestered as test data".
- Online repository and datasets available at: https://ai.google.com/research/NaturalQuestions
- Benchmark Tasks: Natural Questions Benchmark, Open-domain question answering Benchmark, GPT-2 Benchmark Task.
Example(s):
- https://ai.google.com/research/NaturalQuestions/visualization
- …
Counter-Example(s):
- a CoQA Dataset,
- a HotpotQA Dataset,
- a MS COCO Dataset,
- a NarrativeQA Dataset,
- a NewsQA Dataset,
- a QuAC Dataset,
- a RACE Dataset,
- a SearchQA Dataset,
- a SQuAD Dataset,
- a TriviaQA Dataset,
- a WikiQA Dataset.
See: Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.

References

(Google AI) ⇒ https://ai.google.com/research/NaturalQuestions Retrieved:2021-01-03.
- QUOTE: To help spur development in open-domain question answering, we have created the Natural Questions (NQ) corpus, along with a challenge website based on this data. The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets.