Paraphrase Detection Task
(Redirected from paraphrase identification)
Jump to navigation
Jump to search
A Paraphrase Detection Task is a semantic detection task for linguistic sentences in a paraphrase relation.
- Context:
- It can be solved by a Paraphrase Detection System (that implements a paraphrase detection algorithm).
- Example(s):
IsParaphrase
(“Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.”;
“Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.”
) ⇒true
- based on Microsoft Research Paraphrase Corpus.
- based on DIRT Paraphrase Collection.
- based on Sekine's Paraphrase Database.
- …
- Counter-Example(s):
- See: Semantic Identity, Paraphrase.
References
2018
- https://aclweb.org/aclwiki/Paraphrase_Identification_(State_of_the_art)
- source: Microsoft Research Paraphrase Corpus (MSRP)
- task: given a pair of sentences, classify them as paraphrases or not paraphrases
- see: Dolan et al. (2004).
- train: 4,076 sentence pairs (2,753 positive: 67.5%)
- test: 1,725 sentence pairs (1,147 positive: 66.5%)
- see also: Similarity (State of the art)
- Sample data
- Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
- Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
- Class: 1 (true paraphrase)
2015a
- (Yin & Schütze, 2015) ⇒ Wenpeng Yin, and Hinrich Schütze. (2015). “Convolutional Neural Network for Paraphrase Identification.” In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 901-911.
2015
- (Kiros et al., 2015) ⇒ Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. (2015). “Skip-thought Vectors.” In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS-2015).
- QUOTE: We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets.
2011
- (Socher et al., 2011) ⇒ Richard Socher, Eric H. Huang, Jeffrey Pennin, Christopher D. Manning, and Andrew Y. Ng. (2011). “Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection.” In: Advances in Neural Information Processing Systems, pp. 801-809.
2009
- (Das & Smith, 2009) ⇒ Dipanjan Das, and Noah A. Smith. (2009). “Paraphrase Identification As Probabilistic Quasi-synchronous Recognition.” In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL/IJCNLP-2009).
2004
- (Dolan et al., 2004) ⇒ Bill Dolan, Chris Quirk, and Chris Brockett. (2004). “Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources.” In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004).