LegalBench Benchmark
(Redirected from LegalBench)
Jump to navigation
Jump to search
A LegalBench Benchmark is a legal text benchmark that evaluates the legal reasoning capabilities of large language models (LLMs) in the context of legal tasks.
- Context:
- It can (typically) include a variety of tasks that simulate real-world legal scenarios, requiring the application of legal principles and reasoning.
- It can (typically) be used to assess the ability of LLMs to perform tasks traditionally associated with lawyers, such as identifying legal issues, applying legal rules, and drawing conclusions based on legal analysis.
- It can (often) use the IRAC (Issue, Rule, Application, Conclusion) method as a framework for organizing and evaluating legal tasks, which is a standard method in legal analysis.
- It can (often) include tasks beyond the scope of the IRAC Framework, such as client counseling, contract analysis, and negotiation.
- It can be an ongoing project, continually evolving with contributions from legal professionals and AI researchers to reflect the dynamic nature of the legal field.
- It can serve as a resource for legal professionals and researchers to understand the capabilities and potential applications of LLMs in the legal domain.
- ...
- Example(s):
- ...
- Counter-Example(s):
- See: Legal Reasoning, Large Language Models, Benchmarking, IRAC Method.
References
2023
- (Guha et al., 2023) ⇒ Neel Guha, Julian Nyarko, Daniel E Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, and Daniel N. Rockmore. (2023). “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models.” In: arXiv preprint arXiv:2308.11462. doi:10.48550/arXiv.2308.11462.
- ABSTRACT: The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.