2004 LexPageRank

Subject Headings: Multi-Document Extractive Summarization Algorithm.

Notes

Multidocument extractive summarization relies on the concept of sentence centrality to identify the most important sentences in a document. Centrality is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We are now considering an approach for computing sentence importance based on the concept of eigenvector centrality (prestige) that we call LexPageRank. In this model, a sentence connectivity matrix is constructed based on cosine similarity. If the cosine similarity between two sentences exceeds a particular predefined threshold, a corresponding edge is added to the connectivity matrix. We provide an evaluation of our method on DUC 2004 data. The results show that our approach outperforms centroid-based summarization and is quite successful compared to other summarization systems.

Ron Brandow, Karl Mitze, and Lisa F. Rau. (1995). Automatic condensation of electronic publications by sentence selection. Information Processing and Management, 31(5):675–685.
Jaime Carbonell and Jade Goldstein. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Research and Development in Information Retrieval, pages 335–336.
Chin-Yew Lin and E.H. Hovy. (2003). Automatic evaluationof summaries using n-gram co-occurrence. In: Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27 - June 1.
L. Page, S. Brin, Rajeev Motwani, and T. Winograd. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, Stanford, CA.
Dragomir Radev, Hongyan Jing, and Malgorzata Budzikowska. (2000). Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, WA, April.
Dragomir Radev, Sasha Blair-Goldensohn, and Zhu Zhang. (2001). Experiments in single and multidocument summarization using MEAD. In First Document Understanding Conference, New Orleans, LA, September.
Dragomir Radev. (2000). A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: Proceedingseedings, 1st ACL SIGDIAL Workshop on Discourse and Dialogue, Hong Kong, October.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 LexPageRank	Dragomir Radev Günes Erkan			LexPageRank: Prestige in Multi-Document Text Summarization		Proceedings of the Conference on Empirical Methods in Natural Language Processing Proceedings of Empirical Methods in Natural Language Processing Proceedings of the 20th international joint conference on Artificial intelligence	http://aclweb.org/anthology-new/W/W04/W04-3247.pdf			2004