Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please download the full version of the dataset from Zenodo, here.
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.
We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.
Code for replicating the results, together with the model trained on CUAD, is published on Github here.
Contract Understanding Atticus Dataset (CUAD) is a dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for CUAD
This is a modified version of original CUAD which trims the question to its label form.
Dataset Summary
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. CUAD is curated and maintained by The Atticus Project, Inc.… See the full description on the dataset page: https://huggingface.co/datasets/chenghao/cuad_qa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for filtered_cuad
Dataset Summary
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. This dataset is a filtered version of CUAD. It excludes legal contracts with an Agreement date prior to 2002 and contracts which are not… See the full description on the dataset page: https://huggingface.co/datasets/alex-apostolo/filtered-cuad.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Contract Understanding Atticus Dataset (CUAD) PDF
This dataset contains the PDFs and the full text of 509 commercial legal contracts from the original CUAD dataset. One of the original 510 contracts was removed due to being a scanned copy. The extracted text was cleaned using clean-text. The PDFs were encoded in base64 and added as the pdf_bytes_base64 feature. You can easily and quickly load it: dataset = load_dataset("dvgodoy/CUAD_v1_Contract_Understanding_PDF")… See the full description on the dataset page: https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUAD-DeepSeek: Enhanced Legal Contract Understanding Dataset
CUAD-DeepSeek is an enhanced version of the Contract Understanding Atticus Dataset (CUAD), enriched with expert rationales and reasoning traces provided by the DeepSeek language model. This dataset aims to improve legal contract analysis by providing not just classifications but detailed explanations for why specific clauses belong to particular legal categories.
Purpose and Scope
Legal contract review is… See the full description on the dataset page: https://huggingface.co/datasets/zenml/cuad-deepseek.
This dataset was created by lithurshan2000
LC-QuAD is a Large Question Answering dataset with 30,000 pairs of questions and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version.
This dataset was created by UOM_200343G
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Invented in 2008, CUAD biotechnology is among three cost-effective antisense technologies used for insect pest control (RNAi, CUAD, and CRISPR/Cas) based on the formation of duplexes of unmodified nucleic acids (RNAi: guide RNA-mRNA; CUAD: guide DNA-rRNA; CRISPR/Cas: guide RNA-genomic DNA) and action of nucleic acid-guided nucleases (RNAi: Argonaute; CUAD: RNase H; CRISPR/Cas: CRISPR-associated protein). It is amazing that simple CUAD-based ‘genetic zipper’ method is so efficient for IPM.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADGoverningLawLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause specifies which state/country’s law governs the contract.
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADAntiAssignmentLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause requires consent or notice of a party if the contract is assigned to a third party.
Task categoryt2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding model on this dataset using the… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADLicenseGrantLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause contains a license granted by one party to its counterparty.
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADUncappedLiabilityLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that a party's liability is uncapped upon the breach of its obligation in the contract. This also includes uncap liability for a particular type of breach such as IP infringement or breach of confidentiality obligation.
Task category t2c
Domains Legal, Written
Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADUncappedLiabilityLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADThirdPartyBeneficiaryLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that that there a non-contracting party who is a beneficiary to some or all of the clauses in the contract and therefore can enforce its rights against a contracting party.
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADInsuranceLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if clause creates a requirement for insurance that must be maintained by one party for the benefit of the counterparty.
Task category
t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding model on this… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADRevenueProfitSharingLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADLiquidatedDamagesLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause awards either party liquidated damages for breach or a fee upon the termination of a contract (termination fee).
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLiquidatedDamagesLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADMostFavoredNationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if a third party gets better terms on the licensing or sale of technology/goods/services described in the contract, the buyer of such technology/goods/services under the contract shall be entitled to those better terms.
Task category t2c
Domains Legal, Written
Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CUADWarrantyDurationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark
This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a duration of any warranty against defects or errors in technology, products, or services provided under the contract.
Task category t2c
Domains Legal, Written
Reference https://huggingface.co/datasets/nguha/legalbench
How to evaluate on this task
You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please download the full version of the dataset from Zenodo, here.
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.
We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.
Code for replicating the results, together with the model trained on CUAD, is published on Github here.