46 datasets found

Contract Understanding Atticus Dataset (CUAD)
kaggle.com
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/2015428
Dataset updated
Mar 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Atticus Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please download the full version of the dataset from Zenodo, here.

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

Code for replicating the results, together with the model trained on CUAD, is published on Github here.
P
CUAD Dataset
paperswithcode.com
Updated Mar 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hendrycks; Collin Burns; Anya Chen; Spencer Ball (2021). CUAD Dataset [Dataset]. https://paperswithcode.com/dataset/cuad
Explore at:
Dataset updated
Mar 9, 2021
Authors
Dan Hendrycks; Collin Burns; Anya Chen; Spencer Ball
Description
Contract Understanding Atticus Dataset (CUAD) is a dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review.
h
cuad_qa
huggingface.co
Updated Sep 15, 1999
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenghao Mou (1999). cuad_qa [Dataset]. https://huggingface.co/datasets/chenghao/cuad_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 1999
Authors
Chenghao Mou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for CUAD

This is a modified version of original CUAD which trims the question to its label form.

Dataset Summary

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. CUAD is curated and maintained by The Atticus Project, Inc.… See the full description on the dataset page: https://huggingface.co/datasets/chenghao/cuad_qa.
h
filtered-cuad
huggingface.co
Updated Aug 1, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Apostolopoulos (2011). filtered-cuad [Dataset]. https://huggingface.co/datasets/alex-apostolo/filtered-cuad
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2011
Authors
Alex Apostolopoulos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for filtered_cuad

Dataset Summary

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. This dataset is a filtered version of CUAD. It excludes legal contracts with an Agreement date prior to 2002 and contracts which are not… See the full description on the dataset page: https://huggingface.co/datasets/alex-apostolo/filtered-cuad.
h
CUAD_v1_Contract_Understanding_PDF
huggingface.co
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Voigt Godoy (2025). CUAD_v1_Contract_Understanding_PDF [Dataset]. https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Authors
Daniel Voigt Godoy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Contract Understanding Atticus Dataset (CUAD) PDF

This dataset contains the PDFs and the full text of 509 commercial legal contracts from the original CUAD dataset. One of the original 510 contracts was removed due to being a scanned copy. The extracted text was cleaned using clean-text. The PDFs were encoded in base64 and added as the pdf_bytes_base64 feature. You can easily and quickly load it: dataset = load_dataset("dvgodoy/CUAD_v1_Contract_Understanding_PDF")… See the full description on the dataset page: https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF.
h
cuad-deepseek
huggingface.co
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZenML (2025). cuad-deepseek [Dataset]. https://huggingface.co/datasets/zenml/cuad-deepseek
Explore at:
Dataset updated
May 9, 2025
Dataset authored and provided by
ZenML
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUAD-DeepSeek: Enhanced Legal Contract Understanding Dataset

CUAD-DeepSeek is an enhanced version of the Contract Understanding Atticus Dataset (CUAD), enriched with expert rationales and reasoning traces provided by the DeepSeek language model. This dataset aims to improve legal contract analysis by providing not just classifications but detailed explanations for why specific clauses belong to particular legal categories.

Purpose and Scope

Legal contract review is… See the full description on the dataset page: https://huggingface.co/datasets/zenml/cuad-deepseek.
test-cuad
kaggle.com
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lithurshan2000 (2024). test-cuad [Dataset]. https://www.kaggle.com/datasets/lithurshan2000/test-cuad/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
lithurshan2000
Description
Dataset

This dataset was created by lithurshan2000

Contents
P
LC-QuAD Dataset
paperswithcode.com
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). LC-QuAD Dataset [Dataset]. https://paperswithcode.com/dataset/lc-quad-2-0
Explore at:
Dataset updated
Jun 12, 2023
Description
LC-QuAD is a Large Question Answering dataset with 30,000 pairs of questions and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version.
cuad_code
kaggle.com
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UOM_200343G (2024). cuad_code [Dataset]. https://www.kaggle.com/uom200343g/cuad-code/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
UOM_200343G
Description
Dataset

This dataset was created by UOM_200343G

Contents
f
mor_ssDNA/dsDNA_gene_ssDNA/dsDNA_C.hesperidum..pzfx
figshare.com
xml
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Gal'chinsky (2025). mor_ssDNA/dsDNA_gene_ssDNA/dsDNA_C.hesperidum..pzfx [Dataset]. http://doi.org/10.6084/m9.figshare.28938248.v1
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28938248.v1
Dataset updated
May 6, 2025
Dataset provided by
figshare
Authors
Nikita Gal'chinsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Invented in 2008, CUAD biotechnology is among three cost-effective antisense technologies used for insect pest control (RNAi, CUAD, and CRISPR/Cas) based on the formation of duplexes of unmodified nucleic acids (RNAi: guide RNA-mRNA; CUAD: guide DNA-rRNA; CRISPR/Cas: guide RNA-genomic DNA) and action of nucleic acid-guided nucleases (RNAi: Argonaute; CUAD: RNase H; CRISPR/Cas: CRISPR-associated protein). It is amazing that simple CUAD-based ‘genetic zipper’ method is so efficient for IPM.
h
CUADGoverningLawLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADGoverningLawLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADGoverningLawLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies which state/country’s law governs the contract.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification.
h
CUADAntiAssignmentLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADAntiAssignmentLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADAntiAssignmentLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause requires consent or notice of a party if the contract is assigned to a third party.

Task categoryt2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification.
h
CUADLicenseGrantLegalBenchClassification
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADLicenseGrantLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADLicenseGrantLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause contains a license granted by one party to its counterparty.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification.
h
CUADUncappedLiabilityLegalBenchClassification
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADUncappedLiabilityLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADUncappedLiabilityLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADUncappedLiabilityLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that a party's liability is uncapped upon the breach of its obligation in the contract. This also includes uncap liability for a particular type of breach such as IP infringement or breach of confidentiality obligation.

Task category t2c

Domains Legal, Written

Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADUncappedLiabilityLegalBenchClassification.
h
CUADThirdPartyBeneficiaryLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADThirdPartyBeneficiaryLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADThirdPartyBeneficiaryLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that that there a non-contracting party who is a beneficiary to some or all of the clauses in the contract and therefore can enforce its rights against a contracting party.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification.
h
CUADInsuranceLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADInsuranceLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADInsuranceLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if clause creates a requirement for insurance that must be maintained by one party for the benefit of the counterparty.

Task category

t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification.
h
CUADRevenueProfitSharingLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADRevenueProfitSharingLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADRevenueProfitSharingLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification.
h
CUADLiquidatedDamagesLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADLiquidatedDamagesLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADLiquidatedDamagesLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADLiquidatedDamagesLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause awards either party liquidated damages for breach or a fee upon the termination of a contract (termination fee).

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLiquidatedDamagesLegalBenchClassification.
h
CUADMostFavoredNationLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADMostFavoredNationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADMostFavoredNationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if a third party gets better terms on the licensing or sale of technology/goods/services described in the contract, the buyer of such technology/goods/services under the contract shall be entitled to those better terms.

Task category t2c

Domains Legal, Written

Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification.
h
CUADWarrantyDurationLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADWarrantyDurationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADWarrantyDurationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a duration of any warranty against defects or errors in technology, products, or services provided under the contract.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428

Contract Understanding Atticus Dataset (CUAD)

A dataset of legal contracts with rich expert annotations.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/2015428

Dataset updated

Mar 12, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Atticus Project

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please download the full version of the dataset from Zenodo, here.

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

Code for replicating the results, together with the model trained on CUAD, is published on Github here.

Clear search

Close search

Google apps

Main menu

Contract Understanding Atticus Dataset (CUAD)

CUAD Dataset

cuad_qa

filtered-cuad

CUAD_v1_Contract_Understanding_PDF

cuad-deepseek

test-cuad

Dataset

Contents

LC-QuAD Dataset

cuad_code

Dataset

Contents

mor_ssDNA/dsDNA_gene_ssDNA/dsDNA_C.hesperidum..pzfx

CUADGoverningLawLegalBenchClassification

CUADAntiAssignmentLegalBenchClassification

CUADLicenseGrantLegalBenchClassification

CUADUncappedLiabilityLegalBenchClassification

CUADThirdPartyBeneficiaryLegalBenchClassification

CUADInsuranceLegalBenchClassification

CUADRevenueProfitSharingLegalBenchClassification

CUADLiquidatedDamagesLegalBenchClassification

CUADMostFavoredNationLegalBenchClassification

CUADWarrantyDurationLegalBenchClassification

Contract Understanding Atticus Dataset (CUAD)

A dataset of legal contracts with rich expert annotations.