Contract-Related Summarization Task
A Contract-Related Summarization Task is a domain-specific document summarization task that produces a contract-related summary.
- Context:
- output: a Contract-Related Summary.
- It can range from (typically) being a Manual Contract-Related Summarization Task to being an Automated Contract-Related Summarization Task (supported by a contract summarization system).
- It can range from being a Single-Document Contract-Related Summarization Task to being a Multi-Document Contract-Related Summarization Task.
- …
- Example(s):
- Entire-Contract Summarization, such as summarizing a Lease Agreement.
- Contract Article Summarization, such as summarizing a Casualty Article in a lease agreement.
- Contract Revision Summarization, such as: “Summarize the difference between these two revisions of the contract.”
- as explored in (Sancheti et al., 2022) [1]
- as explored in (Manor & Li, 2019) [2].
- as explored in
https://adamscontracts.legalsifter.com/blog/something-else-not-to-use-ai-for-summarizing-contracts
. - Plain English Contract Summarization Task.
- …
- Counter-Example(s):
- See: Medical Summarization, Contract-Related Query-Replying, News Summarization, Prompt Engineering, Natural Language Processing, Contract Management Task, Contract Analysis Task, Legal Contract Dataset.
References
2024
- "Something Else Not to Use AI for: Summarizing Contracts."
- QUOTE: ... But what might make sense for nonfiction writing doesn’t make sense for contracts, for two reasons. First, in contracts, everything matters! It’s like software code—leave something out and bad things can happen.
And second, in the process of summarizing, usually you aren’t able to just prune some words, repeating the rest verbatim. Instead, you likely have to change some words. In the limited and stylized world of contract language, using different words can have significant implications. ...
- QUOTE: ... But what might make sense for nonfiction writing doesn’t make sense for contracts, for two reasons. First, in contracts, everything matters! It’s like software code—leave something out and bad things can happen.
2023
- (ChatGPT-OpenAI, 2023) ⇒ https://chat.openai.com/ 2023-09-17.
- Q: What is a Contract-Related Summarization Task?
- A: A Contract-Related Summarization Task refers to the process of generating concise and coherent summaries of legal contracts or agreements using natural language processing (NLP) techniques. This task involves taking a lengthy and often complex legal document and condensing it into a shorter, more understandable version while retaining the key information and legal nuances. Contract-related summarization tasks are particularly valuable in the legal industry and various business contexts where efficient contract review and comprehension are crucial.
In practice, contract-related summarization can serve several purposes. It can aid lawyers and legal professionals in quickly assessing the content and implications of a contract, making it easier to determine its relevance and potential risks. It can also assist non-legal stakeholders, such as business professionals or clients, in gaining a high-level understanding of a contract's terms and conditions without delving into its intricate details. Additionally, contract summarization can support the automation of contract management processes, enabling organizations to streamline contract analysis and decision-making.
To perform contract-related summarization, NLP models like GPT-3 can be used to extract and condense key clauses, obligations, rights, and provisions from contracts, effectively creating executive summaries or abstracts. These summaries aim to provide a clear overview of the contract's essential elements, helping users make informed decisions or take appropriate actions based on the contract's content.
2022
- (Sancheti et al., 2022) ⇒ Abhilasha Sancheti, Aparna Garimella, Balaji Vasan Srinivasan, and Rachel Rudinger(2022). “What to Read in a Contract? Party-Specific Summarization of Important Obligations, Entitlements, and Prohibitions in Legal Documents. In: arXiv:2212.09825.
- QUOTE: Therefore, we propose a system to generate party-specific summaries consisting of important obligations, entitlements, and prohibitions mentioned in a given contract. The motivation behind different categories of the summary comes from the software license summaries available at TL; DRLegal [1] which describe what users must, can, and cannot do under the license. We first identify all the sentences containing obligations, entitlements, and prohibitions in a given contract with respect to a party, using a content categorizer (§3.1). Then, the identified sentences are ranked based on their importance (e.g., any maintenance or repairs that a tenant is required to do at its expense is more important than delivering insurance certificate to the landlord on a specified date) using an importance ranker (§3.2) trained on a legal expert-annotated dataset that we collect[2] (§4) to quantify the notion of importance. We believe our two-staged approach is less expensive to train compared to training an end-to-end summarization system which would require summaries to be annotated for long contracts (spanning 10 − 100 pages).
This work makes the following contributions: (a) we propose an extractive summarization system (§3), CONTRASUM, to summarize the key obligations, entitlements, and prohibitions mentioned in a contract for each of the parties; (b) we introduce a dataset (§4) consisting of comparative importance annotations for sentences (that include obligations, entitlements, or prohibitions) in lease agreements, with respect to each of the parties; and (c) we perform automatic (§7) and human evaluation (§8) of our system against several unsupervised summarization methods to demonstrate the effectiveness and usefulness of the system. To the best of our knowledge, ours is the first work to collect pairwise importance comparison annotations for sentences in contracts and use it for obtaining summaries for legal contracts.
- QUOTE: Therefore, we propose a system to generate party-specific summaries consisting of important obligations, entitlements, and prohibitions mentioned in a given contract. The motivation behind different categories of the summary comes from the software license summaries available at TL; DRLegal [1] which describe what users must, can, and cannot do under the license. We first identify all the sentences containing obligations, entitlements, and prohibitions in a given contract with respect to a party, using a content categorizer (§3.1). Then, the identified sentences are ranked based on their importance (e.g., any maintenance or repairs that a tenant is required to do at its expense is more important than delivering insurance certificate to the landlord on a specified date) using an importance ranker (§3.2) trained on a legal expert-annotated dataset that we collect[2] (§4) to quantify the notion of importance. We believe our two-staged approach is less expensive to train compared to training an end-to-end summarization system which would require summaries to be annotated for long contracts (spanning 10 − 100 pages).
- ↑ https://www.tldrlegal.com
- ↑ We will publicly release this dataset.
2019
- (Manor & Li, 2019) ⇒ Laura Manor, and Junyi Jessy Li (2019). "Plain English Summarization of Contracts" ArXiv:/abs/1906.00424.
- QUOTE: We propose the task of the automatic summarization of legal documents in plain English for a non-legal audience. We hope that such a technological advancement would enable a greater number of people to enter into everyday contracts with a better understanding of what they are agreeing to. (...)
Rather than attempt to summarize an entire document, these sources summarize each document at the section level. In this way, the reader can reference the more detailed text if need be. The summaries in this dataset are reviewed for quality by the first author, who has 3 years of professional contract drafting experience. The dataset we propose contains 446 sets of parallel text. We show the level of abstraction through the number of novel words in the reference summaries, which is significantly higher than the abstractive single-document summaries created for the shared tasks of the Document Understanding Conference (DUC) in 2002 [1], a standard dataset used for single document news summarization. Additionally, we utilize several common readability metrics to show that there is an average of a 6 year reading level difference between the original documents and the reference summaries in our legal dataset.
In initial experimentation using this dataset, we employ popular unsupervised extractive summarization models such as TextRank [2] and Greedy KL [3], as well as lead baselines. We show that such methods do not perform well on this dataset when compared to the same methods on DUC 2002. These results highlight the fact that this is a very challenging task. As there is not currently a dataset in this domain large enough for supervised methods, we suggest the use of methods developed for simplification and/or style transfer(...)
- QUOTE: We propose the task of the automatic summarization of legal documents in plain English for a non-legal audience. We hope that such a technological advancement would enable a greater number of people to enter into everyday contracts with a better understanding of what they are agreeing to. (...)
- ↑ (Over et al., 2007) ⇒ Paul Over, Hoa Dang, and Donna Harman (2007). “DUC in Context". In: Information Processing & Management, 43(6):1506–1520.
- ↑ (Mihalcea & Tarau, 2004) ⇒ Rada Mihalcea, and Paul Tarau (2004). “Textrank: Bringing Order into Text". In: Proceedings of the 2004 conference on empirical methods in natural language processing.
- ↑ (Haghighi & Vanderwende, 2009) ⇒ Aria Haghighi, and Lucy Vanderwende (2009). “Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362–370.