2004 LookingforaFewGoodMetricsAutoma

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Recall-Oriented Understudy For Gisting Evaluation (ROUGE) Metrics; ROUGE-N; ROUGE-L; ROUGEW, ROUGE-S, ROUGE-SU; ROUGE Summarization Evaluation Software Package, Document Understanding Conference (DUC); Text Summarization Task.

Notes

Cited By

Quotes

Author Keywords

Abstract

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper discusses the validity of the evaluation method used in the Document Understanding Conference (DUC) and evaluates five different ROUGE metrics: ROUGE-N, ROUGE-L, ROUGEW, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data provided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.

References

BibTeX

@inproceedings{2004_LookingforaFewGoodMetricsAutom,
  author    = {Chin-Yew Lin},
  editor    = {Noriko Kando and
               Haruko Ishikawa},
  title     = {Looking for a Few Good Metrics: Automatic Summarization Evaluation
               - How Many Samples Are Enough?},
  booktitle = {Proceedings of the Fourth NTCIR Workshop on Research in Information
               Access Technologies Information Retrieval, Question Answering and
               Summarization (NTCIR 2004) National Center of Sciences, Tokyo, Japan,
               June 2-4, 2004},
  publisher = {National Institute of Informatics (NII)},
  year      = {2004},
  url       = {http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings4/OPEN/NTCIR4-OPEN-LinCY.pdf},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 LookingforaFewGoodMetricsAutomaChin-Yew LinLooking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough?2004