Automated Content Generation Task: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
No edit summary
Line 24: Line 24:
** [[Information Retrieval Tasks]], which measure retrieval relevance but do not assess content generation.
** [[Information Retrieval Tasks]], which measure retrieval relevance but do not assess content generation.
** [[Classification Benchmarks]], which test label prediction accuracy but not generation quality.
** [[Classification Benchmarks]], which test label prediction accuracy but not generation quality.
* <B>See:</B> [[Automated Content Generation System]], [[Performance Metrics]], [[Natural Language Generation Evaluation]], [[Terminology Correctness Measure]], [[Technical Accuracy (Performance Measure)]].
* <B>See:</B> [[Automated Content Generation System]], [[Performance Metric]], [[Natural Language Generation]], [[Terminology Correctness Measure]], [[Technical Accuracy (Performance Measure)]].
----
----
----
----
Line 30: Line 30:
== References ==   
== References ==   


=== 2024 ===   
=== 2024a ===   
* ([[Saad-Falcon et al., 2024]]) ⇒ Saad-Falcon, J., Krishna, R., Paranjape, A., Liao, Q. V., & Radev, D. (2024). [https://arxiv.org/abs/2311.09476 "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"]. In: arXiv Preprint arXiv:2311.09476v2.   
* ([[Saad-Falcon et al., 2024]]) ⇒ Saad-Falcon, J., Krishna, R., Paranjape, A., Liao, Q. V., & Radev, D. (2024). [https://arxiv.org/abs/2311.09476 "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"]. In: arXiv Preprint arXiv:2311.09476v2.   
** QUOTE: [[Automated RAG evaluation system]]s assess [[context relevance]], [[answer faithfulness]], and [[answer relevance]] through [[lightweight LM judge]]s trained on [[synthetic training data]].<P> [[Prediction-powered inference]] (PPI) combines [[automated scoring]] with [[human-annotation validation]] for reliable [[content generation benchmarking]] across [[knowledge-intensive task]]s.   
** QUOTE: [[Automated RAG evaluation system]]s assess [[context relevance]], [[answer faithfulness]], and [[answer relevance]] through [[lightweight LM judge]]s trained on [[synthetic training data]].<P> [[Prediction-powered inference]] (PPI) combines [[automated scoring]] with [[human-annotation validation]] for reliable [[content generation benchmarking]] across [[knowledge-intensive task]]s.   

Revision as of 02:03, 21 March 2025

An Automated Content Generation Task is a software benchmarking task that evaluates automated content generation systems that can automatically output various type of content.



References

2024a

2024b

2024c

2023a

2023b