QA-BiLSTM System
A QA-BiLSTM System is a deep QA system that is based on a BiLSTM training system.
- Context:
- It can solve a QA-LSTM Task by implementing a QA-LSTM Algorithm.
- Example(s):
- Counter-Examples:
- See: Deep Learning, Attention-based Mechanism, QA Dataset, Natural Language Processing, Question Answering System, Artificial Neural Network, Convolutional Neural Network, Recurrent Neural Network, Long Short-Term Memory.
References
2016
- (Tan et al., 2016) ⇒ Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. (2016). “LSTM-based Deep Learning Models for Non-factoid Answer Selection.” In: Proceedings of ICLR 2016 Workshop. eprint arXiv:1511.04108
- QUOTE: QA-LSTM: The basic model in this work is shown in Figure 1. BiLSTM generates distributed representations for both the question and answer independently, and then utilize cosine similarity to measure their distance. Following the same ranking loss in (Feng et al., 2015; Weston et al., 2014; Hu et al., 2014), we define the training objective as a hinge loss.
[math]\displaystyle{ L = max\{0, M − cosine(q, a_+) + cosine(q, a_−)\} \quad \quad(7) }[/math]
where [math]\displaystyle{ a_+ }[/math] is a ground truth answer, [math]\displaystyle{ a_− }[/math] is an incorrect answer randomly chosen from the entire answer space, and [math]\displaystyle{ M }[/math] is constant margin. We treat any question with more than one ground truth as multiple training examples, each for one ground truth. There are three simple ways to generate representations for questions and answers based on the word-level biLSTM outputs: (1) Average pooling; (2) max pooling; (3) the concatenation of the last vectors on both directions. The three strategies are compared with the experimental performance in Section 5. Dropout operation is performed on the QA representations before cosine similarity matching.
Finally, from preliminary experiments, we observe that the architectures, in which both question and answer sides share the same network parameters, is significantly better than the one that the question and answer sides own their own parameters separately, and converges much faster. As discussed in (Feng et al., 2015), this is reasonable, because for a shared layer network, the corresponding elements in question and answer vectors represent the same biLSTM outputs. While for the network with separate question and answer parameters, there is no such constraint and the model has doublesized parameters, making it difficult to learn for the optimizer.
Figure 1: Basic Model: QA-LSTM.
- QUOTE: QA-LSTM: The basic model in this work is shown in Figure 1. BiLSTM generates distributed representations for both the question and answer independently, and then utilize cosine similarity to measure their distance. Following the same ranking loss in (Feng et al., 2015; Weston et al., 2014; Hu et al., 2014), we define the training objective as a hinge loss.