2024 EvaluatingTextSummariesGenerate

From GM-RKB

Jump to navigation Jump to search

(Shakil et al., 2024) ⇒ Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, and Mamoun T Mardini. (2024). “Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT.” In: arXiv preprint arXiv:2405.04053. doi:10.48550/arXiv.2405.04053

Subject Headings: Text Summary Evaluation, LLM-Supported Evaluation, Content Assessment.

Notes

The paper examines the effectiveness of using OpenAI's GPT models to evaluate text summaries generated by six transformer-based models from Hugging Face.
The paper assesses the quality of summarys based on four key attributes: conciseness, relevance, coherence, and readability.
The paper employs both traditional summary evaluation metrics (ROUGE, Latent Semantic Analysis, Flesch-Kincaid) and GPT-based evaluation to provide a comprehensive assessment of summary quality.
The paper finds significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence.
The paper demonstrates GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics.
The paper suggests that integrating AI-driven tools like GPT with traditional metrics could lead to more comprehensive and nuanced evaluation methods in natural language processing.
The paper proposes future research directions, including expanding the evaluation framework to diverse NLP tasks, exploring other transformer-based models, and refining the integration of AI-driven evaluations with traditional metrics.

Cited By

http://scholar.google.com/scholar?q=%222024%22+Evaluating+Text+Summaries+Generated+by+Large+Language+Models+Using+OpenAI%27s+GPT

Quotes

Abstract

This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2024 EvaluatingTextSummariesGenerate	Hassan Shakil Atqiya Munawara Mahi Phuoc Nguyen Zeydy Ortiz Mamoun T Mardini			Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT				10.48550/arXiv.2405.04053		2024

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2024_EvaluatingTextSummariesGenerate&oldid=884284"

Facts

... more about "2024 EvaluatingTextSummariesGenerate"

Hassan Shakil +, Atqiya Munawara Mahi +, Phuoc Nguyen +, Zeydy Ortiz + and Mamoun T Mardini +

10.48550/arXiv.2405.04053 +

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT +

2024 +