RDASS Text Summarization Measure

From GM-RKB
Jump to navigation Jump to search

An RDASS Text Summarization Measure is a text summarization performance measure based on the similarity between the original document vector and the reference summary vector, and between the original document and the generated summary.



References

2023

  • (Yun et al., 2023) ⇒ Jiseon Yun, Jae Eui Sohn, and Sunghyon Kyeong. (2023). “Fine-Tuning Pretrained Language Models to Enhance Dialogue Summarization in Customer Service Centers.” In: Proceedings of the Fourth ACM International Conference on AI in Finance. doi:10.1145/3604237.3626838
    • QUOTE: ... The results demonstrated that the fine-tuned model based on KakaoBank’s internal datasets outperformed the reference model, showing a 199% and 12% improvement in ROUGE-L and RDASS, respectively. ...
    • QUOTE: ... RDASS is a comprehensive evaluation metric that considers the relationships among the original document, reference summary, and model-generated summary. Compared to ROUGE, RDASS performed better in terms of relevance, consistency, and fluency of sentences in Korean. Therefore, we employed both ROUGE and RDASS as evaluation metrics, considering their respective strengths and weaknesses of each metric. ...
    • QUOTE: ... RDASS measures the similarity between the vectors of the original document and reference summary. Moreover, it measures the similarity between the vectors of the original document and generated summary. Finally, RDASS can be obtained by computing their average. ...

2020

  • (Lee, Shin et al., 2020) ⇒ Dongyub Lee, Myeongcheol Shin, Taesun Whang, Seungwoo Cho, Byeongil Ko, Daniel Lee, Eunggyun Kim, and Jaechoon Jo. (2020). “Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization.” In: arXiv preprint arXiv:2005.03510. DOI:10.48550/arXiv.2005.03510
    • ABSTRACT: Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Many existing works for text summarization are generally evaluated by using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores.