2023 LargeLanguageModelsAreLegalButT

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Legal Text NLP, mic-F1, mac-F1, LexGLUE, LEDGAR Dataset.

Notes

  • Objective: The study aims to quantify the performance of general LLMs in comparison to legal-domain models, focusing on zero-shot performance in contract provision classification tasks.
  • Methodology: The paper compares the performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark. The authors examine how these models, which were not explicitly trained on legal data, handle legal text classification.
  • Findings:
    • The general LLMs showed competence in theme classification but were less effective compared to smaller models fine-tuned on legal data.
    • There is a performance gap of up to 19.2% in mic-F1 and 26.8% in mac-F1 between the best general LLM and smaller legal-domain models.
    • The study highlights the need for more powerful domain-specific LLMs in the legal field.
  • Technical Insights:
    • The authors discuss the limitations of BERT-based models in handling long legal documents, due to their maximum token input size.
    • They highlight the benefits of transformer-based architectures that can handle longer texts, especially when combined with sparse-attention and hierarchical networks.
  • Contributions:

Cited By

Quotes

Abstract

Realizing the recent advances in Natural Language Processing (NLP) to the legal sector poses challenging problems such as extremely long sequence length s, specialized vocabulary that is usually only understood by legal professional s, and high amounts of data imbalance. The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain due to their ability to handle lengthy, complex sequences. Moreover, the emergence of domain-specific LLMs has displayed extremely promising results on various tasks. In this study, we aim to quantify how general LLMs perform in comparison to legal-domain models (be it an LLM or otherwise). Specifically, we compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification. Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases. However, we find that their mic-F1 / mac-F1 performance is up to 19.2 / 26.8\% lesser than smaller models fine-tuned on the legal domain, thus underscoring the need for more powerful legal-domain LLMs.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2023 LargeLanguageModelsAreLegalButTThanmay Jayakumar
Fauzan Farooqui
Luqman Farooqui
Large Language Models Are Legal But They Are Not: Making the Case for a Powerful LegalLLM10.48550/arXiv.2311.088902023