Text Summary Evaluation Task
Jump to navigation
Jump to search
A Text Summary Evaluation Task is a NLG evaluation task that involves assessing the quality of automatically or manually generated text summaries.
- Context:
- input: a Text Summary, Text Summary Evaluation Measure.
- output: a Text Summary Evaluation Score.
- It can (typically) employ Summary Evaluation Metrics such as ROUGE, BLEU, and Flesch-Kincaid to measure quality aspects like conciseness, relevance, and readability.
- It can range from being a Simple Text Summary Evaluation Task to being a Complex Text Summary Evaluation Task.
- It can range from being a Human-Performed Text Summary Evaluation Task to being an Automated Text Summary Evaluation Task.
- It can incorporate User Feedback to adjust and improve the evaluation criteria and the summarization algorithms themselves.
- ...
- Example(s):
- A system that uses ROUGE to evaluate summaries of scientific articles to aid researchers in literature review.
- An online news portal that assesses summaries of daily news using both BLEU and user feedback to optimize content delivery to its audience.
- a Contract Summary Evaluation Task, such as a redlined contract summary evaluation task.
- ...
- Counter-Example(s):
- Text Generation Tasks, which focus on creating text rather than evaluating it.
- Document Classification Tasks, where the primary goal is to categorize documents into predefined topics.
- See: Natural Language Processing, Automated Text Summarization, Evaluation Metric
References
2024
- (Shakil et al., 2024) ⇒ Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, and Mamoun T Mardini. (2024). “Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT.” In: arXiv preprint arXiv:2405.04053. doi:10.48550/arXiv.2405.04053
- NOTES: The paper employs both traditional summary evaluation metrics (ROUGE, Latent Semantic Analysis, Flesch-Kincaid) and GPT-based evaluation to provide a comprehensive assessment of summary quality.