NewsQA Dataset

Context:
- It contains 119,633 natural language questions posed by crowdworkers on 12,744 news articles from CNN.
- Online repository: https://github.com/Maluuba/newsqa
- Datasets available at: https://www.microsoft.com/en-us/research/project/newsqa-dataset/
- Benchmark Task: NewsQA Machine Comprehension Challenge.
Example(s):
- DeepMind Q&A Dataset,
- …
Counter-Example(s):
- a CoQA Dataset,
- a FigureQA Dataset,
- a Frames Dataset,
- a MS COCO Dataset,
- a NarrativeQA Dataset,
- a RACE Dataset,
- a SearchQA Dataset,
- a SQuAD Dataset,
- a TriviaQA Dataset.
See: Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.

References

(MS Research Montreal, 2020) ⇒ https://www.microsoft.com/en-us/research/project/newsqa-dataset/ Retrieved: 2020-12-27.
- QUOTE: With massive volumes of written text being produced every second, how do we make sure that we have the most recent and relevant information available to us? Microsoft research Montreal is tackling this problem by building AI systems that can read and comprehend large volumes of complex text in real-time.
  The purpose of the NewsQA dataset is to help the research community build algorithms that are capable of answering questions requiring human-level comprehension and reasoning skills.
  Leveraging CNN articles from the DeepMind Q&A Dataset, we prepared a crowd-sourced machine reading comprehension dataset of 120K Q&A pairs.