39 datasets found

Contract Understanding Atticus Dataset (CUAD)
kaggle.com
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/2015428
Dataset updated
Mar 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Atticus Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please download the full version of the dataset from Zenodo, here.

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

Code for replicating the results, together with the model trained on CUAD, is published on Github here.
O
CUAD (Contract Understanding Atticus Dataset)
opendatalab.com
zip
Updated Sep 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Nueva School (2022). CUAD (Contract Understanding Atticus Dataset) [Dataset]. https://opendatalab.com/OpenDataLab/CUAD
Explore at:
zip(18309308 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
The Nueva School
University of California, Berkeley
Description
Contract Understanding Atticus Dataset (CUAD) is a dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review.
h
filtered-cuad
huggingface.co
Updated Aug 1, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Apostolopoulos (2011). filtered-cuad [Dataset]. https://huggingface.co/datasets/alex-apostolo/filtered-cuad
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2011
Authors
Alex Apostolopoulos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for filtered_cuad

Dataset Summary

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. This dataset is a filtered version of CUAD. It excludes legal contracts with an Agreement date prior to 2002 and contracts which are not… See the full description on the dataset page: https://huggingface.co/datasets/alex-apostolo/filtered-cuad.
h
cuad_qa
huggingface.co
Updated Sep 15, 1999
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenghao Mou (1999). cuad_qa [Dataset]. https://huggingface.co/datasets/chenghao/cuad_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 1999
Authors
Chenghao Mou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for CUAD

This is a modified version of original CUAD which trims the question to its label form.

Dataset Summary

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. CUAD is curated and maintained by The Atticus Project, Inc.… See the full description on the dataset page: https://huggingface.co/datasets/chenghao/cuad_qa.
h
cuad-deepseek
huggingface.co
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZenML (2025). cuad-deepseek [Dataset]. https://huggingface.co/datasets/zenml/cuad-deepseek
Explore at:
Dataset updated
May 9, 2025
Dataset authored and provided by
ZenML
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUAD-DeepSeek: Enhanced Legal Contract Understanding Dataset

CUAD-DeepSeek is an enhanced version of the Contract Understanding Atticus Dataset (CUAD), enriched with expert rationales and reasoning traces provided by the DeepSeek language model. This dataset aims to improve legal contract analysis by providing not just classifications but detailed explanations for why specific clauses belong to particular legal categories.

Purpose and Scope

Legal contract review is… See the full description on the dataset page: https://huggingface.co/datasets/zenml/cuad-deepseek.
h
CUAD_v1_Contract_Understanding_PDF
huggingface.co
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Voigt Godoy (2025). CUAD_v1_Contract_Understanding_PDF [Dataset]. https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Authors
Daniel Voigt Godoy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Contract Understanding Atticus Dataset (CUAD) PDF

This dataset contains the PDFs and the full text of 509 commercial legal contracts from the original CUAD dataset. One of the original 510 contracts was removed due to being a scanned copy. The extracted text was cleaned using clean-text. The PDFs were encoded in base64 and added as the pdf_bytes_base64 feature. You can easily and quickly load it: dataset = load_dataset("dvgodoy/CUAD_v1_Contract_Understanding_PDF")… See the full description on the dataset page: https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF.
MAUD v1
zenodo.org
zip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Atticus Project; The Atticus Project (2024). MAUD v1 [Dataset]. http://doi.org/10.5281/zenodo.7500064
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7500064
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
The Atticus Project; The Atticus Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Merger Agreement Understanding Dataset (MAUD) v1 is a corpus of 47,000+ labels in 152 merger agreements that have been manually labeled under the supervision of experienced lawyers to identify 92 questions in each agreement used by the 2021 American Bar Association (ABA) Public Target Deal Points Study.

MAUD is curated and maintained by The Atticus Project, Inc. to support NLP research and development in legal contract review.

ReadMe and Datasheet are published here. Code for replicating the results, together with the model trained on CUAD, is published on Github here.
h
CUADGoverningLawLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADGoverningLawLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADGoverningLawLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies which state/country’s law governs the contract.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification.
h
CUADInsuranceLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADInsuranceLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADInsuranceLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if clause creates a requirement for insurance that must be maintained by one party for the benefit of the counterparty.

Task category

t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification.
Z
MAUD v1
data.niaid.nih.gov
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Atticus Project (2024). MAUD v1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7500063
Explore at:
Dataset updated
Jul 15, 2024
Dataset authored and provided by
The Atticus Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Merger Agreement Understanding Dataset (MAUD) v1 is a corpus of 47,000+ labels in 152 merger agreements that have been manually labeled under the supervision of experienced lawyers to identify 92 questions in each agreement used by the 2021 American Bar Association (ABA) Public Target Deal Points Study.

MAUD is curated and maintained by The Atticus Project, Inc. to support NLP research and development in legal contract review.

ReadMe and Datasheet are published here. Code for replicating the results, together with the model trained on CUAD, is published on Github here.
P
Merger Agreement Understanding Dataset (MAUD) Dataset
paperswithcode.com
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks (2023). Merger Agreement Understanding Dataset (MAUD) Dataset [Dataset]. https://paperswithcode.com/dataset/merger-agreement-understanding-dataset-maud
Explore at:
Dataset updated
Jan 1, 2023
Authors
Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks
Description
MAUD is an expert-annotated merger agreement reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points study, where lawyers and law students answered 92 questions about 152 merger agreements.

With over 39,000 examples and 47,000 total annotations, it is the largest expert-annotated legal reading comprehension dataset in the English language, as well as the first expert-annotated merger agreement dataset.
h
CUADAntiAssignmentLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADAntiAssignmentLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADAntiAssignmentLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause requires consent or notice of a party if the contract is assigned to a third party.

Task categoryt2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification.
h
CUADWarrantyDurationLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADWarrantyDurationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADWarrantyDurationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a duration of any warranty against defects or errors in technology, products, or services provided under the contract.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification.
h
CUADLicenseGrantLegalBenchClassification
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADLicenseGrantLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADLicenseGrantLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause contains a license granted by one party to its counterparty.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification.
h
CUADThirdPartyBeneficiaryLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADThirdPartyBeneficiaryLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADThirdPartyBeneficiaryLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that that there a non-contracting party who is a beneficiary to some or all of the clauses in the contract and therefore can enforce its rights against a contracting party.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification.
h
CUADMostFavoredNationLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADMostFavoredNationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADMostFavoredNationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if a third party gets better terms on the licensing or sale of technology/goods/services described in the contract, the buyer of such technology/goods/services under the contract shall be entitled to those better terms.

Task category t2c

Domains Legal, Written

Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification.
h
CUADRevenueProfitSharingLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADRevenueProfitSharingLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADRevenueProfitSharingLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification.
h
CUADExpirationDateLegalBenchClassification
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADExpirationDateLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADExpirationDateLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADExpirationDateLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies the date upon which the initial term expires.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADExpirationDateLegalBenchClassification.
h
CUADPriceRestrictionsLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADPriceRestrictionsLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADPriceRestrictionsLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADPriceRestrictionsLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause places a restriction on the ability of a party to raise or reduce prices of technology, goods, or services provided.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADPriceRestrictionsLegalBenchClassification.
h
CUADVolumeRestrictionLegalBenchClassification
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). CUADVolumeRestrictionLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADVolumeRestrictionLegalBenchClassification
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CUADVolumeRestrictionLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a fee increase or consent requirement, etc. if one party's use of the product/services exceeds certain threshold.

Task category t2c

Domains Legal, Written

Reference https://huggingface.co/datasets/nguha/legalbench

How to evaluate on this task

You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADVolumeRestrictionLegalBenchClassification.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428

Contract Understanding Atticus Dataset (CUAD)

A dataset of legal contracts with rich expert annotations.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/2015428

Dataset updated

Mar 12, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Atticus Project

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please download the full version of the dataset from Zenodo, here.

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

Code for replicating the results, together with the model trained on CUAD, is published on Github here.

Clear search

Close search

Google apps

Main menu

Contract Understanding Atticus Dataset (CUAD)

CUAD (Contract Understanding Atticus Dataset)

filtered-cuad

cuad_qa

cuad-deepseek

CUAD_v1_Contract_Understanding_PDF

MAUD v1

CUADGoverningLawLegalBenchClassification

CUADInsuranceLegalBenchClassification

MAUD v1

Merger Agreement Understanding Dataset (MAUD) Dataset

CUADAntiAssignmentLegalBenchClassification

CUADWarrantyDurationLegalBenchClassification

CUADLicenseGrantLegalBenchClassification

CUADThirdPartyBeneficiaryLegalBenchClassification

CUADMostFavoredNationLegalBenchClassification

CUADRevenueProfitSharingLegalBenchClassification

CUADExpirationDateLegalBenchClassification

CUADPriceRestrictionsLegalBenchClassification

CUADVolumeRestrictionLegalBenchClassification

Contract Understanding Atticus Dataset (CUAD)

A dataset of legal contracts with rich expert annotations.