2024 ScreensRedliningEvaluation

From GM-RKB
Jump to navigation Jump to search

Subject Headings: TermScout.

Notes

Cited By

Quotes

Abstract

Evaluating the accuracy of large language models (LLMs) on contract review tasks is critical to understanding reliability in the field. At Screens, we focus on application-specific ways to evaluate the performance of various aspects of our LLM stack. We’ve previously released an evaluation report that measures an LLM system’s ability to classify a contract as meeting or not meeting sets of substantive, well-defined standards.

Now, we turn our attention to the system’s ability to correct failed standards with suggested redlines. We find that the Screens product, which employs this system, achieves a 97.6% success rate at correcting failed standards with redlines.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2024 ScreensRedliningEvaluationEvan HarrisScreens Redlining Evaluation2024