ContractNLI Dataset
A ContractNLI Dataset is a document level legal NLP benchmark that is used for legal contract review benchmark tasks.
- Context:
- Example(s):
- Contradiction:
- Premise: This Agreement may be terminated by either party upon 30 days written notice.
- Hypothesis: The Agreement cannot be terminated by the parties.
- Entailment:
- Premise: This Agreement shall be binding upon and inure to the benefit of the parties hereto and their respective successors and assigns.
- Hypothesis: The Agreement is binding on the parties.
- Neutral:
- Premise: This Agreement shall commence on January 1, 2022.
- Hypothesis: The Agreement commences in the summer.
- …
- Contradiction:
- Counter-Example(s):
- LEXTREME: A comprehensive multi-lingual and multi-task benchmark for the legal domain.
- LegalBench.
- Contract Understanding Atticus Dataset (CUAD),
- See: Legal Contract, Contract Review System, Contract-Focused AI System, Natural Language Processing System.
References
2023
- (Ghosh et al., 2023) ⇒ Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S. Ramaneswaran, S. Sakshi, Utkarsh Tyagi, and Dinesh Manocha. (2023). “DALE: Generative Data Augmentation for Low-Resource Legal NLP.” doi:10.48550/arXiv.2310.15799
- NOTES:
- It contains annotated premise-hypothesis pairs extracted from contracts to classify their logical relationship as entailment, contradiction or neutral.
- The premises are extracts from contract sentences while hypotheses are manually written by law students.
- It has 37k pairs covering diverse contracts like MSAs, NDAs, settlements etc. from various sources.
- NOTES:
2021a
- (Koreeda & Manning, 2021a) ⇒ Yuta Koreeda, and Christopher D. Manning. (2021). “ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts.” doi:10.48550/arXiv.2110.01799
- QUOTE: ... Our contributions are as follows:
1) We annotated and release [1] a dataset consisting of 607 contracts. This is the first dataset to utilize NLI for contracts and is also the largest corpus of annotated contracts. ...
- QUOTE: ... Our contributions are as follows:
2021b
- (GitHub, 2021) ⇒ https://stanfordnlp.github.io/contract-nli/
- QUOTE: ContractNLI is a dataset for document-level natural language inference (NLI) on contracts whose goal is to automate/support a time-consuming procedure of contract review. In this task, a system is given a set of hypotheses (such as “Some obligations of Agreement may survive termination.”) and a contract, and it is asked to classify whether each hypothesis is entailed by, contradicting to or not mentioned by (neutral to) the contract as well as identifying evidence for the decision as spans in the contract.
An overview of document-level NLI for contracts
ContractNLI is the first dataset to utilize NLI for contracts and is also the largest corpus of annotated contracts (as of September 2021). ContractNLI is an interesting challenge to work on from a machine learning perspective (the label distribution is imbalanced and it is naturally multi-task, all the while training data being scarce) and from a linguistic perspective (linguistic characteristics of contracts, particularly negations by exceptions, make the problem difficult).
Details of ContractNLI can be found in our paper that was published in “Findings of EMNLP 2021”. If you have a question regarding our dataset, you can contact us by emailing koreeda@stanford.edu or by creating an issue in this repository.
- Dataset specification
More formally, the task consists of:
- Natural language inference (NLI): Document-level three-class classification (one of Entailment, Contradiction or NotMentioned).
- Evidence identification: Multi-label binary classification over span_s, where a _span is a sentence or a list item within a sentence. This is only defined when NLI label is either Entailment or Contradiction. Evidence spans need not be contiguous but need to be comprehensively identified where they are redundant.
- We have 17 hypotheses annotated on 607 non-disclosure agreements (NDAs). The hypotheses are fixed throughout all the contracts including the test dataset.
- QUOTE: ContractNLI is a dataset for document-level natural language inference (NLI) on contracts whose goal is to automate/support a time-consuming procedure of contract review. In this task, a system is given a set of hypotheses (such as “Some obligations of Agreement may survive termination.”) and a contract, and it is asked to classify whether each hypothesis is entailed by, contradicting to or not mentioned by (neutral to) the contract as well as identifying evidence for the decision as spans in the contract.