2015 TeachingMachinestoReadandCompre

(Hermann et al., 2015) ⇒ Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. (2015). “Teaching Machines to Read and Comprehend.” In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). ePrint: arXiv:1506.03340v3

Subject Headings: Attention-Based QA-LSTM; Attention-Based QA-LSTM-CNN; Attention Mechanism; Deep LSTM Reader; Machine Reading System, Neural Natural Language Processing System CNN-Daily Mail Dataset.

Notes

Cited By

Quotes

Abstract

Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work, we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.

1. Introduction

Progress on the path from shallow bag-of-words information retrieval algorithms to machines capable of reading and understanding documents has been slow. Traditional approaches to machine reading and comprehension have been based on either hand engineered grammars (Riloff & Thelen, 2000), or information extraction methods of detecting predicate argument triples that can later be queried as a relational database (Poon et al, 2010). Supervised machine learning approaches have largely been absent from this space due to both the lack of large scale training datasets, and the difficulty in structuring statistical models flexible enough to learn to exploit document structure.

While obtaining supervised natural language reading comprehension data has proved difficult, some researchers have explored generating synthetic narratives and queries (Weston et al., 2015; Sukhbaatar et al., 2015). Such approaches allow the generation of almost unlimited amounts of supervised data and enable researchers to isolate the performance of their algorithms on individual simulated phenomena. Work on such data has shown that neural network based models hold promise for modelling reading comprehension, something that we will build upon here. Historically, however, many similar approaches in Computational Linguistics have failed to manage the transition from synthetic data to real environments, as such closed worlds inevitably fail to capture the complexity, richness, and noise of natural language (Winograd, 1972).

In this work we seek to directly address the lack of real natural language training data by introducing a novel approach to building a supervised reading comprehension data set. We observe that summary and paraphrase sentences, with their associated documents, can be readily converted to context–query–answer triples using simple entity detection and anonymisation algorithms. Using this approach we have collected two new corpora of roughly a million news stories with associated queries from the CNN and Daily Mail websites.

We demonstrate the efficacy of our new corpora by building novel deep learning models for reading comprehension. These models draw on recent developments for incorporating attention mechanisms into recurrent neural network architectures (Bahdanau et al., 2015, Minh et al., 2014; Gregor et al., 2015 ; Sukhbaatar et al., 2015). This allows a model to focus on the aspects of a document that it believes will help it answer a question, and also allows us to visualises its inference process. We compare these neural models to a range of baselines and heuristic benchmarks based upon a traditional frame semantic analysis provided by a state-of-the-art natural language processing (NLP) pipeline. Our results indicate that the neural models achieve a higher accuracy, and do so without any specific encoding of the document or query structure.

2. Supervised Training Data for Reading Comprehension

3. Models

4. Empirical Evaluation

5. Conclusion

References

2014a

(Das et al., 2014) ⇒ Dipanjan Das, Desai Chen, André F. T. Martins, Nathan Schneider, and Noah A. Smith (2014). "Frame-semantic Parsing". In: Computational Linguistics, v.40 n.1.

2014b

(Hermann et al., 2014) ⇒ Karl Moritz Hermann, Dipanjan Das, Jason Weston, and Kuzman Ganchev (2014). "Semantic Frame Identification with Distributed Word Representations". In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).

2014c

(Kalchbrenner et al., 2014) ⇒ Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom (2014). "A Convolutional Neural Network for Modelling Sentences". In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). DOI:10.3115/v1/P14-1062.

2014d

(Minh et al., 2014) ⇒ Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu (2014). "Recurrent Models of Visual Attention". In: Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014).

2014e

(Sutskever et al., 2014) ⇒ Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. (2014). “Sequence to Sequence Learning with Neural Networks.” In: Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems (NIPS 2014).

2013

(Richardson et al., 2013) ⇒ Matthew Richardson, Christopher J. C. Burges, and Erin Renshaw. (2013). “MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text.” In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013). A meeting of SIGDAT, a Special Interest Group of the ACL.

2012a

(Graves, 2012) ⇒ Alex Graves. (2012). “Supervised Sequence Labelling with Recurrent Neural Networks.” Springer Berlin Heidelberg. ISBN:9783642247965.

2012b

(Tieleman & Hinton, 2012) ⇒ Tijmen Tieleman, and Geoffrey Hinton (2012). Lecture 6.5—RmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning.

2011

(Collobert et al., 2011b) ⇒ Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. (2011). “Natural Language Processing (Almost) from Scratch.” In: The Journal of Machine Learning Research, 12.

2010a

(Poon et al., 2010) ⇒ Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Mausam, Alan Ritter, Stefan Schoenmackers, Stephen Soderland, Dan Weld, Fei Wu, and Congle Zhang (2010). "Machine Reading at the University of Washington". In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading.

2010b

(Woodsend & Lapata, 2010) ⇒ Kristian Woodsend, and Mirella Lapata (2010). "Automatic Generation of Story Highlights". In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010).

2007

(Svore et al., 2007) Krysta Svore, Lucy Vanderwende, and Christopher Burges (2007). "Enhancing Single-document Summarization by Combining RankNet and Third-party Sources". In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

2000

(Riloff & Thelen, 2000) ⇒ Ellen Riloff and Michael Thelen (2000). "A Rule-based Question Answering System for Reading Comprehension Tests". In: Proceedings of the ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems.

1997

(Hochreiter & Schmidhuber, 1997) ⇒ Sepp Hochreiter, and Jurgen Schmidhuber. (1997). “Long Short-term Memory". In: Neural computation, 9(8). DOI:10.1162/neco.1997.9.8.1735.

1972

(Winograd, 1972) ⇒ Terry Winograd (1972). “Understanding Natural Language". In: Academic Press, Inc., Orlando, FL, USA.

1953

(Taylor, 1953) ⇒ Wilson L Taylor. "Cloze Procedure": A New Tool for Measuring Readability. In: Journalism Quarterly, 30:415-433.

BibTeX

@inproceedings{2015_TeachingMachinestoReadandCompre,
  author    = {Karl Moritz Hermann and
               Tomas Kocisky and
               Edward Grefenstette and
               Lasse Espeholt and
               Will Kay and
               Mustafa Suleyman and
               Phil Blunsom},
  editor    = {Corinna Cortes and
               Neil D. Lawrence and
               Daniel D. Lee and
               Masashi Sugiyama and
               Roman Garnett},
  title     = {Teaching Machines to Read and Comprehend},
  booktitle = {Advances in Neural Information Processing Systems 28: Annual Conference
               on Neural Information Processing Systems 2015}
  month     = {December},
  address   = {Montreal, Quebec, Canada},
  pages     = {1693--1701},
  year      = {2015},
  url       = {http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend}
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 TeachingMachinestoReadandCompre	Edward Grefenstette Karl Moritz Hermann Phil Blunsom Lasse Espeholt Mustafa Suleyman Tomas Kocisky Will Kay			Teaching Machines to Read and Comprehend						2015