2024 AddressingAnnotatedDataScarcity

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Legal-Domain NER.

Notes

Cited By

Quotes

Abstract

Named Entity Recognition (NER) models face unique challenges in the field of legal text analysis, primarily due to the scarcity of annotated legal data. The creation of a diverse and representative legal text corpus is hindered by the labor-intensive, time-consuming, and expensive nature of manual annotation, leading to suboptimal model performance when trained on insufficient or biased data. This study explores the effectiveness of Generative Pre-trained Transformers (GPT) in overcoming these challenges. Leveraging the generative capabilities of GPT models, we use them as tools for creating human-like annotated data. Through experiments, our research reveals that the pre-trained BERT model, when fine-tuned on GPT-3 generated data, surpasses its counterpart fine-tuned on human-created data in the legal NER task. The demonstrated success of this methodology underscores the potential of large language models (LLMs) in advancing the development of more reliable and contextually aware Legal NER systems for intricate legal texts. This work contributes to the broader goal of enhancing the accuracy and efficiency of information extraction in the legal domain, showcasing the transformative impact of advanced language models on addressing data scarcity issues.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2024 AddressingAnnotatedDataScarcityMay Myo Zin
Ha Thanh Nguyen
Ken Satoh
Fumihito Nishino
Addressing Annotated Data Scarcity in Legal Information Extraction10.1007/978-981-97-3076-6_62024