46 datasets found
  1. Contract Understanding Atticus Dataset (CUAD)

    • kaggle.com
    Updated Mar 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Atticus Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please download the full version of the dataset from Zenodo, here.

    Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

    We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

    Code for replicating the results, together with the model trained on CUAD, is published on Github here.

  2. P

    CUAD Dataset

    • paperswithcode.com
    Updated Mar 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Hendrycks; Collin Burns; Anya Chen; Spencer Ball (2021). CUAD Dataset [Dataset]. https://paperswithcode.com/dataset/cuad
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Dan Hendrycks; Collin Burns; Anya Chen; Spencer Ball
    Description

    Contract Understanding Atticus Dataset (CUAD) is a dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review.

  3. h

    cuad_qa

    • huggingface.co
    Updated Sep 15, 1999
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenghao Mou (1999). cuad_qa [Dataset]. https://huggingface.co/datasets/chenghao/cuad_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 1999
    Authors
    Chenghao Mou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for CUAD

    This is a modified version of original CUAD which trims the question to its label form.

      Dataset Summary
    

    Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. CUAD is curated and maintained by The Atticus Project, Inc.… See the full description on the dataset page: https://huggingface.co/datasets/chenghao/cuad_qa.

  4. h

    filtered-cuad

    • huggingface.co
    Updated Aug 1, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Apostolopoulos (2011). filtered-cuad [Dataset]. https://huggingface.co/datasets/alex-apostolo/filtered-cuad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2011
    Authors
    Alex Apostolopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for filtered_cuad

      Dataset Summary
    

    Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled to identify 41 categories of important clauses that lawyers look for when reviewing contracts in connection with corporate transactions. This dataset is a filtered version of CUAD. It excludes legal contracts with an Agreement date prior to 2002 and contracts which are not… See the full description on the dataset page: https://huggingface.co/datasets/alex-apostolo/filtered-cuad.

  5. h

    CUAD_v1_Contract_Understanding_PDF

    • huggingface.co
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Voigt Godoy (2025). CUAD_v1_Contract_Understanding_PDF [Dataset]. https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2025
    Authors
    Daniel Voigt Godoy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Contract Understanding Atticus Dataset (CUAD) PDF

    This dataset contains the PDFs and the full text of 509 commercial legal contracts from the original CUAD dataset. One of the original 510 contracts was removed due to being a scanned copy. The extracted text was cleaned using clean-text. The PDFs were encoded in base64 and added as the pdf_bytes_base64 feature. You can easily and quickly load it: dataset = load_dataset("dvgodoy/CUAD_v1_Contract_Understanding_PDF")… See the full description on the dataset page: https://huggingface.co/datasets/dvgodoy/CUAD_v1_Contract_Understanding_PDF.

  6. h

    cuad-deepseek

    • huggingface.co
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZenML (2025). cuad-deepseek [Dataset]. https://huggingface.co/datasets/zenml/cuad-deepseek
    Explore at:
    Dataset updated
    May 9, 2025
    Dataset authored and provided by
    ZenML
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUAD-DeepSeek: Enhanced Legal Contract Understanding Dataset

    CUAD-DeepSeek is an enhanced version of the Contract Understanding Atticus Dataset (CUAD), enriched with expert rationales and reasoning traces provided by the DeepSeek language model. This dataset aims to improve legal contract analysis by providing not just classifications but detailed explanations for why specific clauses belong to particular legal categories.

      Purpose and Scope
    

    Legal contract review is… See the full description on the dataset page: https://huggingface.co/datasets/zenml/cuad-deepseek.

  7. test-cuad

    • kaggle.com
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lithurshan2000 (2024). test-cuad [Dataset]. https://www.kaggle.com/datasets/lithurshan2000/test-cuad/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    lithurshan2000
    Description

    Dataset

    This dataset was created by lithurshan2000

    Contents

  8. P

    LC-QuAD Dataset

    • paperswithcode.com
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). LC-QuAD Dataset [Dataset]. https://paperswithcode.com/dataset/lc-quad-2-0
    Explore at:
    Dataset updated
    Jun 12, 2023
    Description

    LC-QuAD is a Large Question Answering dataset with 30,000 pairs of questions and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version.

  9. cuad_code

    • kaggle.com
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UOM_200343G (2024). cuad_code [Dataset]. https://www.kaggle.com/uom200343g/cuad-code/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    UOM_200343G
    Description

    Dataset

    This dataset was created by UOM_200343G

    Contents

  10. f

    mor_ssDNA/dsDNA_gene_ssDNA/dsDNA_C.hesperidum..pzfx

    • figshare.com
    xml
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Gal'chinsky (2025). mor_ssDNA/dsDNA_gene_ssDNA/dsDNA_C.hesperidum..pzfx [Dataset]. http://doi.org/10.6084/m9.figshare.28938248.v1
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    figshare
    Authors
    Nikita Gal'chinsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Invented in 2008, CUAD biotechnology is among three cost-effective antisense technologies used for insect pest control (RNAi, CUAD, and CRISPR/Cas) based on the formation of duplexes of unmodified nucleic acids (RNAi: guide RNA-mRNA; CUAD: guide DNA-rRNA; CRISPR/Cas: guide RNA-genomic DNA) and action of nucleic acid-guided nucleases (RNAi: Argonaute; CUAD: RNase H; CRISPR/Cas: CRISPR-associated protein). It is amazing that simple CUAD-based ‘genetic zipper’ method is so efficient for IPM.

  11. h

    CUADGoverningLawLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADGoverningLawLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADGoverningLawLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause specifies which state/country’s law governs the contract.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADGoverningLawLegalBenchClassification.

  12. h

    CUADAntiAssignmentLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADAntiAssignmentLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADAntiAssignmentLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause requires consent or notice of a party if the contract is assigned to a third party.

    Task categoryt2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADAntiAssignmentLegalBenchClassification.

  13. h

    CUADLicenseGrantLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADLicenseGrantLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADLicenseGrantLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause contains a license granted by one party to its counterparty.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLicenseGrantLegalBenchClassification.

  14. h

    CUADUncappedLiabilityLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADUncappedLiabilityLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADUncappedLiabilityLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADUncappedLiabilityLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that a party's liability is uncapped upon the breach of its obligation in the contract. This also includes uncap liability for a particular type of breach such as IP infringement or breach of confidentiality obligation.

    Task category t2c

    Domains Legal, Written

    Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADUncappedLiabilityLegalBenchClassification.

  15. h

    CUADThirdPartyBeneficiaryLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADThirdPartyBeneficiaryLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADThirdPartyBeneficiaryLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that that there a non-contracting party who is a beneficiary to some or all of the clauses in the contract and therefore can enforce its rights against a contracting party.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADThirdPartyBeneficiaryLegalBenchClassification.

  16. h

    CUADInsuranceLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADInsuranceLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADInsuranceLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if clause creates a requirement for insurance that must be maintained by one party for the benefit of the counterparty.

      Task category
    

    t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding model on this… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADInsuranceLegalBenchClassification.

  17. h

    CUADRevenueProfitSharingLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADRevenueProfitSharingLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADRevenueProfitSharingLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification.

  18. h

    CUADLiquidatedDamagesLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADLiquidatedDamagesLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADLiquidatedDamagesLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADLiquidatedDamagesLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause awards either party liquidated damages for breach or a fee upon the termination of a contract (termination fee).

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADLiquidatedDamagesLegalBenchClassification.

  19. h

    CUADMostFavoredNationLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADMostFavoredNationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADMostFavoredNationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if a third party gets better terms on the licensing or sale of technology/goods/services described in the contract, the buyer of such technology/goods/services under the contract shall be entitled to those better terms.

    Task category t2c

    Domains Legal, Written

    Reference… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADMostFavoredNationLegalBenchClassification.

  20. h

    CUADWarrantyDurationLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADWarrantyDurationLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADWarrantyDurationLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a duration of any warranty against defects or errors in technology, products, or services provided under the contract.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADWarrantyDurationLegalBenchClassification.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Atticus Project (2021). Contract Understanding Atticus Dataset (CUAD) [Dataset]. http://doi.org/10.34740/kaggle/dsv/2015428
Organization logo

Contract Understanding Atticus Dataset (CUAD)

A dataset of legal contracts with rich expert annotations.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Atticus Project
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please download the full version of the dataset from Zenodo, here.

Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.

We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.

Code for replicating the results, together with the model trained on CUAD, is published on Github here.

Search
Clear search
Close search
Google apps
Main menu