2024 ScreensAccuracyEvaluationReport

From GM-RKB
Jump to navigation Jump to search

Subject Headings: TermScout.

Notes

```

Cited By

Quotes

Abstract

Evaluating the accuracy of large language models (LLMs) on contract review tasks is critical to understanding reliability in the field. However, objectivity is a challenge when evaluating long form, free text responses to prompts. We present an evaluation methodology that measures an LLM system’s ability to classify a contract as meeting or not meeting sets of substantive, well-defined standards. This approach serves as a foundational step for various use cases, including playbook execution, workflow routing, negotiation, redlining, summarization, due diligence, and more. We find that the Screens product, which employs this system, achieves a 97.5% accuracy rate. Additionally, we explore how different LLMs and methods impact AI accuracy.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2024 ScreensAccuracyEvaluationReportEvan HarrisScreens Accuracy Evaluation Report2024