Neural Generative Question Answering (GENQA) Task
A Neural Generative Question Answering (GENQA) Task is a QA Task that is based on a supervised sequence-to-sequence learning task and is solved by a GENQA System.
- Context:
- Input: Question (a sequence of words).
- Output:
- Generative Answer (a sequence of words),
- (optional) GENQA KB-triple,
- (optional) a GENQA performance metric.
- Task Requirements:
- a Knowledge Base formed by dataset triples: (subject, predicate, object).
- Training Data and Test Data consisting of QA-pairs (Question, Answer) and the KB's triples.
- Grounding Algorithm such as an Aho-Corasick String Searching Algorithm to retrieve a list of candidate GEMQA KB-triples to each QA-pair.
- a Performance Metric to estimate the accuracy of GENQA KB-triple and QA-pair matching and grounding alogrithms.
- Example(s):
- Counter-Example(s):
- See: Artificial Neural Network, Deep Learning Neural Network, Natural Language Processing Task, Attention Mechanism, Long Short-Term Memory (LSTM) RNN Model.
References
2016
- (Yin et al., 2016) ⇒ Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, and Xiaoming Li. (2016). “Neural Generative Question Answering.” In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). ISBN:978-1-57735-770-4. arXiv preprint arXiv:1512.01337.
- QUOTE: We formalize generative question answering as a supervised learning task or more specifically a sequence-to-sequence learning task. A generative QA system takes a sequence of words as input question and generates another sequence of words as output answer. In order to provide right answers, the system is connected with a knowledge-base (KB), which contains facts(...)
To facilitate research on the task of generative QA, we create a new dataset by collecting data from the web. We first build a knowledge-base by mining from three Chinese encyclopedia web sites. Specifically we extract entities and associated triples (subject, predicate, object) from the structured parts (e.g. HTML tables) of the web pages at the web sites (...)
We automatically and heuristically construct training and test data for generative QA by “grounding” the QA pairs with the triples in the knowledge-base. Specifically, for each QA pair, a list of candidate triples with the subject fields appearing in the question, is retrieved by using the Aho-Corasick string searching algorithm. The triples in the candidate list are then judged by a series of rules for relevance to the QA pair. The basic requirement for relevance is that the answer contains the object of the triple, which specifies the KB-word in the answer. Besides, we use additional scoring and filtering rules, attempting to find out the triple that truly matches the QA pair, if there is any. As the result of processing, 720K instances (tuples of question, answer, triple) are finally obtained with an estimated accuracy of 80%, i.e., 80% of instances have truly correct grounding (...)
The data is further randomly partitioned into training dataset and test dataset by using triple as the partition key. In this way, all the questions in the test data are regarding to the unseen facts (triples) in the training data.
- QUOTE: We formalize generative question answering as a supervised learning task or more specifically a sequence-to-sequence learning task. A generative QA system takes a sequence of words as input question and generates another sequence of words as output answer. In order to provide right answers, the system is connected with a knowledge-base (KB), which contains facts(...)