Legal Contract Natural Language Inference (NLI) Task
A Legal Contract Natural Language Inference (NLI) Task is a legal NLI task that is a legal contract NLP task.
- Context:
- It can be based on a Legal Contract NLI Benchmark Task.
- ...
- Example(s):
- Counter-Example(s):
- a Legal Statutes NLI Task, such as:
- See: Legal Annotated Corpus.
References
2021
- (Koreeda & Manning, 2021) ⇒ Yuta Koreeda, and Christopher D. Manning. (2021). “ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts.” In: Findings of the Association for Computational Linguistics: EMNLP 2021.
- ABSTRACT: Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose "document-level natural language inference (NLI) for contracts", a novel, real-world application of NLI that addresses such problems. In this task, a system is given a set of hypotheses (such as "Some obligations of Agreement may survive termination.") and a contract, and it is asked to classify whether each hypothesis is "entailed by", "contradicting to" or "not mentioned by" (neutral to) the contract as well as identifying "evidence" for the decision as spans in the contract. We annotated and release the largest corpus to date consisting of 607 annotated contracts. We then show that existing models fail badly on our task and introduce a strong baseline, which (1) models evidence identification as multi-label classification over spans instead of trying to predict start and end tokens, and (2) employs more sophisticated context segmentation for dealing with long documents. We also show that linguistic characteristics of contracts, such as negations by exceptions, are contributing to the difficulty of this task and that there is much room for improvement.
- QUOTE: Helped by their accessibility, there exist multiple prior works on legal NLI. One of the subtasks in COLIEE-2020 shared task (Rabelo et al., 2020) was, given a court decision Q and relevant cases, to extract relevant paragraphs from the cases and to classify whether those paragraphs entail “Q” or “not Q”. Holzenberger et al. (2020) introduced a dataset for predicting an entailment relationship between a statement and a statute excerpt. While they are both “legal” and “NLI”, statutes and contracts exhibit different characteristics including the fact that statutes/cases tend to be written in consistent vocabulary and styles. Moreover, there only exists a single right answer for a hypothesis in case/statute law NLI, whereas a hypothesis can be entailed by or contradicting to each contract in our task; i.e., hypotheses and documents have one-to-one relationships in case/statute law NLI, but they have many-to-many relationships in our task.
As discussed in Section 1, our task has practical and scientific significance compared to information extraction for contracts (Leivaditi et al., 2020; Hendrycks et al., 2021). We showed in our experiments that the NLI part of our task is much more challenging than the evidence identification task. Furthermore, we gave observations to linguistic characteristics of our dataset that are lacking in these prior works.
Lippi et al. (2019) presented a dataset where certain types of contract clauses are identified and annotated with “clearly fair”, “potentially unfair” or “clearly unfair”. While the format of the task input and output is quite similar, our task requires reasoning over a much diverse set of hypotheses than just fair or unfair. Similarly, fact extraction and claim verification tasks (Thorne et al., 2018; Jiang et al., 2020), where the task is to extract facts from Wikipedia articles and to classify whether the claim is entailed by the facts, have similar input and output formats. ...
2021
- (Hendrycks et al., 2021) ⇒ Dan Hendrycks, Collin Burns, Anya Chen, and Spencer Ball. (2021). “CUAD: An Expert-annotated Nlp Dataset for Legal Contract Review.” arXiv preprint arXiv:2103.06268
- ABSTRACT: Many specialized domains remain untouched by deep learning, as large labeled datasets require expensive expert annotators. We address this bottleneck within the legal domain by introducing the Contract Understanding Atticus Dataset (CUAD), a new dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review. We find that Transformer models have nascent performance, but that this performance is strongly influenced by model design and training dataset size. Despite these promising results, there is still substantial room for improvement. As one of the only large, specialized NLP benchmarks annotated by experts, CUAD can serve as a challenging research benchmark for the broader NLP community.