Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please download the full version of the dataset from Zenodo, here.
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.
We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.
Code for replicating the results, together with the model trained on CUAD, is published on Github here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Global Preferential Trade Agreements Database (GPTAD) provides information on preferential trade agreements (PTAs) around the world, including agreements that have not been notified to the World Trade Organization (WTO). It is designed to help trade policy makers, scholars, and business operators better understand and navigate the world of PTAs. The GPTAD contains the original text of PTAs that have been notified to the WTO as well as agreements that have not yet been notified. The database is updated on a regular basis and currently comprises more than 330 PTAs. Agreements in the database have been indexed using a classification consistent with the WTO criteria. The GPTAD is a unique online tool that allows users to search PTAs around the world by provisions or keywords and to compare provisions across multiple agreements.
The Terms of Service dataset is a law dataset corresponding to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.
Prepaid account agreement data, which contain general terms and conditions, pricing, and fee information, that issuers submit to the Bureau under the terms of the Prepaid Rule. Data is refreshed nightly.
The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It provides senior management an overall view of existing data sharing agreements; fosters productive sharing of health information with VHA's external partners; and streamlines data acquisition to improve data management responsibilities overall. Agreements that VHA has established with entities within the VA are not candidates for this Repository.
A new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7
Check out our website at atticusprojectai.org.
Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826
KL3M Data Project
Note: This page provides general information about the KL3M Data Project. Additional details specific to this dataset will be added in future updates. For complete information, please visit the GitHub repository or refer to the KL3M Data Project paper.
Description
This dataset is part of the ALEA Institute's KL3M Data Project, which provides copyright-clean training resources for large language models.
Dataset Details
Format: Parquet… See the full description on the dataset page: https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information about the contents of 100 Terms of Service (ToS) of online platforms. The documents were analyzed and evaluated from the point of view of the European Union consumer law. The main results have been presented in the table titled "Terms of Service Analysis and Evaluation_RESULTS." This table is accompanied by the instruction followed by the annotators, titled "Variables Definitions," allowing for the interpretation of the assigned values. In addition, we provide the raw data (analyzed ToS, in the folder "Clear ToS") and the annotated documents (in the folder "Annotated ToS," further subdivided).
SAMPLE: The sample contains 100 contracts of digital platforms operating in sixteen market sectors: Cloud storage, Communication, Dating, Finance, Food, Gaming, Health, Music, Shopping, Social, Sports, Transportation, Travel, Video, Work, and Various. The selected companies' main headquarters span four legal surroundings: the US, the EU, Poland specifically, and Other jurisdictions. The chosen platforms are both privately held and publicly listed and offer both fee-based and free services. Although the sample cannot be treated as representative of all online platforms, it nevertheless accounts for the most popular consumer services in the analyzed sectors and contains a diverse and heterogeneous set.
CONTENT: Each ToS has been assigned the following information: 1. Metadata: 1.1. the name of the service; 1.2. the URL; 1.3. the effective date; 1.4. the language of ToS; 1.5. the sector; 1.6. the number of words in ToS; 1.7–1.8. the jurisdiction of the main headquarters; 1.9. if the company is public or private; 1.10. if the service is paid or free. 2. Evaluative Variables: remedy clauses (2.1– 2.5); dispute resolution clauses (2.6–2.10); unilateral alteration clauses (2.11–2.15); rights to police the behavior of users (2.16–2.17); regulatory requirements (2.18–2.20); and various (2.21–2.25). 3. Count Variables: the number of clauses seen as unclear (3.1) and the number of other documents referred to by the ToS (3.2). 4. Pull-out Text Variables: rights and obligations of the parties (4.1) and descriptions of the service (4.2)
ACKNOWLEDGEMENT: The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021, project no. 2020/37/K/HS5/02769, titled “Private Law of Data: Concepts, Practices, Principles & Politics.”
Public contracts with the City of Bloomington since 2018.
This dataset provides three resources on Bilateral Labor Agreements signed between 1945 and 2020. More information is available on this project at: https://www.law.uchicago.edu/bilateral-labor-agreements-dataset. .
The County is a party to various credit agreements, including short term notes, Direct Pay variable rate agreements , Direct Placement variable rate agreements, and an operating Line of Credit. Current credit agreements that the county is a party to are made available below.
Full edition for public use. This dataset contains information on which non-trade issues occur in preferential trade agreements signed between 1945 and 2020. These range from various environmental protection issues, such as clean air or biodiversity, over basic human rights, such as the right to vote, to economic and social rights, such as the prohibition of forced labor.
This dataset contains all of the current parcels that are currently under an Open Space Use Agreement between the owners of the parcel and the County of Albemarle. These agreements limit construction and development activity on the property owner's land, and lasts from 4 to 10 years. For more information on any particular agreement, contact the Real Estate division of the County of Albemarle's Finance Department.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Bilateral Labor Agreements Dataset and additional replication files for "Immigration and International Law"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about news. It has 3 rows and is filtered where the keywords includes General Agreement on Tariffs and Trade (Organization)-History. It features 10 columns including source, publication date, section, and news link.
The Credit Card Agreements (CCA) database includes credit card agreements from more than 600 card issuers. These agreements include general terms and conditions, pricing, and fee information and are collected quarterly pursuant to requirements in the CARD Act.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Merger Agreement Understanding Dataset (MAUD) v1 is a corpus of 47,000+ labels in 152 merger agreements that have been manually labeled under the supervision of experienced lawyers to identify 92 questions in each agreement used by the 2021 American Bar Association (ABA) Public Target Deal Points Study.
MAUD is curated and maintained by The Atticus Project, Inc. to support NLP research and development in legal contract review.
ReadMe and Datasheet are published here. Code for replicating the results, together with the model trained on CUAD, is published on Github here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please download the full version of the dataset from Zenodo, here.
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.
We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.
Code for replicating the results, together with the model trained on CUAD, is published on Github here.