CoQA Challenge
A CoQA Challenge is a Machine Learning Benchmark Task that evaluates the performance of Question-Answering Systems on CoQA Datasets.
- Contex:
- Resource(s): Software repository is available at https://stanfordnlp.github.io/coqa/
- Task Input(s): CoQA Datasets.
- Task Output(s): Performance Metrics.
- Task Requirement(s):
- Benchmark Performance Metrics:
- Baseline Models:
- RoBERTa + AT + KD (ensemble);
- TR-MT (ensemble);
- RoBERTa + AT + KD (single model);
- Google SQuAD 2.0 + MMFT (ensemble and single model);
- XLNet + Augmentation (single model);
- ConvBERT (ensemble);
- BERT + MMFT + ADA (ensemble);
- XLNet + MMFT + ADA (single model);
- BERT + AttentionFusionNet (single model);
- BERT + Answer Verification (single model);
- BERT with History Augmented Query (single model);
- BERT Large Fine-tuned Baseline (single model);
- BERT Large Augmented (single model);
- D-AoA + BERT (single model);
- BERT Augmented + AoA (single model);
- CNet (single model);
- SDNet (ensemble);
- CQANet (single model);
- …
- …
- Counter-Example(s):
- See: CoQA Challenge, Question-Answering System, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Generation Task.
References
2020
- (CoQA, 2020) ⇒ https://stanfordnlp.github.io/coqa/ Retrieved:2020-06-03.
- QUOTE: CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. (...)
CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers. The unique features of CoQA include 1) the questions are conversational; 2) the answers can be free-form text; 3) each answer also comes with an evidence subsequence highlighted in the passage; and 4) the passages are collected from seven diverse domains. CoQA has a lot of challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning.
- QUOTE: CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation.
2019
- (Reddy et al., 2019) ⇒ Siva Reddy, Danqi Chen, and Christopher D. Manning. (2019). “CoQA: A Conversational Question Answering Challenge.” In: Transactions of the Association for Computational Linguistics Journal, 7. DOI:10.1162/tacl_a_00266.
- QUOTE: ... we introduce CoQA, a Conversational Question Answering dataset for measuring the ability of machines to participate in a question-answering style conversation. In CoQA, a machine has to understand a text passage and answer a series of questions that appear in a conversation. We develop CoQA with three main goals in mind.
The first concerns the nature of questions in a human conversation (...)
The second goal of CoQA is to ensure the naturalness of answers in a conversation (...)
The third goal of CoQA is to enable building QA systems that perform robustly across domains (...)
(...)Following SQuAD, we use macro-average F1 score of word overlap as our main evaluation metric[1].
- QUOTE: ... we introduce CoQA, a Conversational Question Answering dataset for measuring the ability of machines to participate in a question-answering style conversation. In CoQA, a machine has to understand a text passage and answer a series of questions that appear in a conversation. We develop CoQA with three main goals in mind.
- ↑ SQuAD also uses exact-match metric, however, we think F1 is more appropriate for our dataset because of the free-form answers.