100+ datasets found
  1. E

    Atticus Open Contract Dataset (AOK) (beta)

    • live.european-language-grid.eu
    • explore.openaire.eu
    • +1more
    csv
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Atticus Open Contract Dataset (AOK) (beta) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7648
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 22, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7

    Check out our website at atticusprojectai.org.

    Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826

  2. f

    ALeaseBert

    • uvaauas.figshare.com
    html
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Rossi; E. Kanoulas; S. Leivaditi (2023). ALeaseBert [Dataset]. http://doi.org/10.21942/uva.19732993.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    J. Rossi; E. Kanoulas; S. Leivaditi
    License

    http://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html

    Description

    DATA This is the data from the paper "A Benchmark for Lease Contract Review" (https://arxiv.org/abs/2010.10386).

    The weights of our ALeaseBERT model (ALeaseBert.zip) The dataset of lease contracts and its annotations (annotated_dataset.zip) Samples: sample.html is a contract, sample.json has the corresponding annotations Metadata: annotations-legend.json has the dictionary of annotated entities

    LICENSE This data is made available under the terms of CC BY-NC 4.0

    See http://creativecommons.org/licenses/by-nc/4.0/deed.en See http://creativecommons.org/licenses/by-nc/4.0/legalcode

  3. P

    Contract Discovery Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Łukasz Borchmann; Dawid Wiśniewski; Andrzej Gretkowski; Izabela Kosmala; Dawid Jurkiewicz; Łukasz Szałkiewicz; Gabriela Pałka; Karol Kaczmarek; Agnieszka Kaliska; Filip Graliński (2022). Contract Discovery Dataset [Dataset]. https://paperswithcode.com/dataset/contract-discovery
    Explore at:
    Dataset updated
    Oct 16, 2022
    Authors
    Łukasz Borchmann; Dawid Wiśniewski; Andrzej Gretkowski; Izabela Kosmala; Dawid Jurkiewicz; Łukasz Szałkiewicz; Gabriela Pałka; Karol Kaczmarek; Agnieszka Kaliska; Filip Graliński
    Description

    A new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts.

  4. P

    Terms of Service Dataset

    • paperswithcode.com
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Lippi; Przemyslaw Palka; Giuseppe Contissa; Francesca Lagioia; Hans-Wolfgang Micklitz; Giovanni Sartor; Paolo Torroni (2024). Terms of Service Dataset [Dataset]. https://paperswithcode.com/dataset/terms-of-service
    Explore at:
    Dataset updated
    Feb 21, 2024
    Authors
    Marco Lippi; Przemyslaw Palka; Giuseppe Contissa; Francesca Lagioia; Hans-Wolfgang Micklitz; Giovanni Sartor; Paolo Torroni
    Description

    The Terms of Service dataset is a law dataset corresponding to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.

  5. h

    kl3m-data-edgar-agreements-sample

    • huggingface.co
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ALEA Institute (2025). kl3m-data-edgar-agreements-sample [Dataset]. https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2025
    Authors
    ALEA Institute
    Description

    KL3M Data Project

    Note: This page provides general information about the KL3M Data Project. Additional details specific to this dataset will be added in future updates. For complete information, please visit the GitHub repository or refer to the KL3M Data Project paper.

      Description
    

    This dataset is part of the ALEA Institute's KL3M Data Project, which provides copyright-clean training resources for large language models.

      Dataset Details
    

    Format: Parquet… See the full description on the dataset page: https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements-sample.

  6. d

    Data from: Purchase Orders and Contracts

    • catalog.data.gov
    • data.brla.gov
    • +1more
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.brla.gov (2025). Purchase Orders and Contracts [Dataset]. https://catalog.data.gov/dataset/purchase-orders-and-contracts
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    data.brla.gov
    Description

    Listing of all purchase orders and contracts issued to procure goods and/or services within City-Parish. In the City-Parish, a PO/Contract is made up of two components: a header and one or many detail items that comprise the overarching PO/Contract. The header contains information that pertains to the entire PO/Contract. This includes, but is not limited to, the total amount of the PO/Contract, the department requesting the purchase and the vendor providing the goods or services. The detail item(s) contain information that is specific to the individual item ordered or service procured through the PO/Contract. The item/service description, item/service quantity and the cost of the item is located within the PO/Contract details. There may be one or many detail items on an individual PO/Contract. For example, a Purchase Order for a computer equipment may include three items: the computer, the monitor and the base software package. Both header information and detail item information are included in this dataset in order to provide a comprehensive view of the PO/Contract data. The Record Type field indicates whether the record is a header record (H) or detail item record (D). In the computer purchase example from above, the system would display 4 records – one header record and 3 detail item records. It should be noted header information will be duplicated on all detail items. No detail item information will be displayed on the header record. ***In October of 2017, the City-Parish switched to a new system used to track PO/Contracts. This data contains all PO/Contracts entered in or after October 2017. For prior year data, please see the Legacy Purchase Order dataset https://data.brla.gov/Government/Legacy-Purchase-Orders/54bn-2sqf

  7. d

    OCP Procurement Agreements

    • data.detroitmi.gov
    • detroitdata.org
    • +3more
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Detroit (2019). OCP Procurement Agreements [Dataset]. https://data.detroitmi.gov/datasets/ocp-procurement-agreements/explore
    Explore at:
    Dataset updated
    Dec 12, 2019
    Dataset authored and provided by
    City of Detroit
    Description

    The Procurement Agreements dataset provides details about contract agreements between the City of Detroit and suppliers who provide materials, equipment and services to the City. Initial and amended contracts and purchase orders associated with the contracts are included in the dataset, In some cases, purchase orders are generated to pay suppliers for work completed under a contract. If available, a link to the contract agreement document in PDF format is provided in the 'Contract Link' field of each record (row) in the dataset. This dataset is updated weekly with data from the Office of Contracting and Procurement (OCP).

  8. n

    FOI 26605 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Nov 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). FOI 26605 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-26605
    Explore at:
    Dataset updated
    Nov 24, 2022
    License

    Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
    License information was derived automatically

    Description

    Further to the original Enterprise Application request, the contract below has expired. Please provide the current status. Finance Capita CRM Trustmarque Solutions Ltd I'd like to apologise for the length of this request, and how tedious it may be to handle. That being said, please make an effort to provide all of this information. The information I'm requesting is regarding the software contracts that the organisation uses, for the following fields.Enterprise Resource Planning Software Solution (ERP): Primary Customer Relationship Management Solution (CRM): For example, Salesforce, Lagan CRM, Microsoft Dynamics; software of this nature. Primary Human Resources (HR) and Payroll Software Solution: For example, iTrent, ResourceLink, HealthRoster; software of this nature. The organisation’s primary corporate Finance Software Solution: For example, Agresso, Integra, Sapphire Systems; software of this nature. Name of Supplier: Can you please provide me with the software provider for each contract? The brand of the software: Can you please provide me with the actual name of the software. Please do not provide me with the supplier name again please provide me with the actual software name. Description of the contract: Can you please provide me with detailed information about this contract and please state if upgrade, maintenance and support is included. Please also list the software modules included in these contracts. Number of Users/Licenses: What is the total number of user/licenses for this contract? Annual Spend: What is the annual average spend for each contract? Contract Duration: What is the duration of the contract please include any available extensions within the contract. Contract Start Date: What is the start date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Expiry: What is the expiry date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Review Date: What is the review date of this contract? Please include month and year of the contract. If this cannot be provide please provide me estimates of when the contract is likely to be reviewed. DD-MM-YY or MM-YY. Contact Details: I require the full contact details of the person within the organisation responsible for this particular software contract (name, job title, email, contact number).’

  9. National Inpatient Sample (NIS) - Restricted Access Files

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Feb 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). National Inpatient Sample (NIS) - Restricted Access Files [Dataset]. https://catalog.data.gov/dataset/hcup-national-nationwide-inpatient-sample-nis-restricted-access-file
    Explore at:
    Dataset updated
    Feb 22, 2025
    Description

    The Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) is the largest publicly available all-payer inpatient care database in the United States. The NIS is designed to produce U.S. regional and national estimates of inpatient utilization, access, cost, quality, and outcomes. Unweighted, it contains data from more than 7 million hospital stays each year. Weighted, it estimates more than 35 million hospitalizations nationally. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality (AHRQ), HCUP data inform decision making at the national, State, and community levels. Starting with the 2012 data year, the NIS is a sample of discharges from all hospitals participating in HCUP, covering more than 97 percent of the U.S. population. For prior years, the NIS was a sample of hospitals. The NIS allows for weighted national estimates to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. The NIS's large sample size enables analyses of rare conditions, such as congenital anomalies; uncommon treatments, such as organ transplantation; and special patient populations, such as the uninsured. NIS data are available since 1988, allowing analysis of trends over time. The NIS inpatient data include clinical and resource use information typically available from discharge abstracts with safeguards to protect the privacy of individual patients, physicians, and hospitals (as required by data sources). Data elements include but are not limited to: diagnoses, procedures, discharge status, patient demographics (e.g., sex, age), total charges, length of stay, and expected payment source, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge’. The NIS excludes data elements that could directly or indirectly identify individuals. Restricted access data files are available with a data use agreement and brief online security training.

  10. o

    Service Agreement on Storage and Dissemination of Research Data /...

    • explore.openaire.eu
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swedish National Data Service (2024). Service Agreement on Storage and Dissemination of Research Data / Uppdragsavtal avseende lagring och förmedling av forskningsdata [Dataset]. http://doi.org/10.5281/zenodo.11278410
    Explore at:
    Dataset updated
    May 24, 2024
    Authors
    Swedish National Data Service
    Description

    Service Agreement on Storage and Dissemination of Research Data A service agreement for the storage and dissemination of research data is required to use SND's repository (SND CARE) for research data. The agreement details the legal prerequisites for the use of SND CARE and the various commitments and responsibilities of SND and the research principal, respectively. Members of the SND Network are advised to sign a general service agreement for the use of SND CARE. For individual researchers who do not belong to a research principal that is a member of the SND Network, or if the research principal has not signed a general service agreement with SND, an agreement is signed digitally in SND’s system for describing and sharing data (DORIS) per submitted dataset. This document is the agreement template. The terms of the contract may vary slightly for individual parties. The agreement is translated into English, but for reading only. The Swedish version is the official agreement and the one which is to be signed. Uppdragsavtal avseende lagring och förmedling För att använda SND:s repositorium (SND CARE) för forskningsdata tecknas ett Uppdragsavtal avseende lagring och förmedling av forskningsdata med SND. Avtalet redogör bland annat för de rättsliga förutsättningarna för användandet av SND CARE, samt innehåller de olika åtaganden och ansvar som åligger SND respektive forskningshuvudmannen. Medlemmar i SND-nätverket rekommenderas att teckna ett generellt uppdragsavtal för användning av SND CARE. För enskilda forskare som inte tillhör en forskningshuvudman som är medlem i SND:s nätverk, eller om forskningshuvudmannen inte har tecknat ett generellt uppdragsavtal med SND, signeras ett avtal per inlämnat dataset digitalt i DORIS (SND:s system för att beskriva och dela data). Det som delas här är den avtalsmall som SND använder. Avtalsvillkoren kan variera något med enskilda parter. Avtalet finns översatt till engelska, men enbart för läsning. Det är den svenska versionen av avtalet som är den formella och som signeras.

  11. P

    Merger Agreement Understanding Dataset (MAUD) Dataset

    • paperswithcode.com
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks (2023). Merger Agreement Understanding Dataset (MAUD) Dataset [Dataset]. https://paperswithcode.com/dataset/merger-agreement-understanding-dataset-maud
    Explore at:
    Dataset updated
    Jan 1, 2023
    Authors
    Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks
    Description

    MAUD is an expert-annotated merger agreement reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points study, where lawyers and law students answered 92 questions about 152 merger agreements.

    With over 39,000 examples and 47,000 total annotations, it is the largest expert-annotated legal reading comprehension dataset in the English language, as well as the first expert-annotated merger agreement dataset.

  12. d

    Data Collaborations Across Boundaries (Slides)

    • data.depositar.io
    pdf
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    depositar (2025). Data Collaborations Across Boundaries (Slides) [Dataset]. https://data.depositar.io/dataset/data-collaborations-across-boundaries
    Explore at:
    pdf(4440122), pdf(10713394), pdf(1792282), pdf(1296859), pdf(3112569)Available download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    depositar
    Description

    This dataset collects the slides that were presented at the Data Collaborations Across Boundaries session in SciDataCon 2022, part of the International Data Week.

    The following session proposal was prepared by Tyng-Ruey Chuang and submitted to SciDataCon 2022 organizers for consideration on 2022-02-28. The proposal was accepted on 2022-03-28. Six abstracts were submitted and accepted to this session. Five presentations were delivered online in a virtual session on 2022-06-21.

    Data Collaborations Across Boundaries

    There are many good stories about data collaborations across boundaries. We need more. We also need to share the lessons each of us has learned from collaborating with parties and communities not in our familiar circles.

    By boundaries, we mean not just the regulatory borders in between the nation states about data sharing but the various barriers, readily conceivable or not, that hinder collaboration in aggregating, sharing, and reusing data for social good. These barriers to collaboration exist between the academic disciplines, between the economic players, and between the many user communities, just to name a few. There are also cross-domain barriers, for example those that lay among data practitioners, public administrators, and policy makers when they are articulating the why, what, and how of "open data" and debating its economic significance and fair distribution. This session aims to bring together experiences and thoughts on good data practices in facilitating collaborations across boundaries and domains.

    The success of Wikipedia proves that collaborative content production and service, by ways of copyleft licenses, can be sustainable when coordinated by a non-profit and funded by the general public. Collaborative code repositories like GitHub and GitLab demonstrate the enormous value and mass scale of systems-facilitated integration of user contributions that run across multiple programming languages and developer communities. Research data aggregators and repositories such as GBIF, GISAID, and Zenodo have served numerous researchers across academic disciplines. Citizen science projects and platforms, for instance eBird, Galaxy Zoo, and Taiwan Roadkill Observation Network (TaiRON), not only collect data from diverse communities but also manage and release datasets for research use and public benefit (e.g. TaiRON datasets being used to improve road design and reduce animal mortality). At the same time large scale data collaborations depend on standards, protocols, and tools for building registries (e.g. Archival Resource Key), ontologies (e.g. Wikidata and schema.org), repositories (e.g. CKAN and Omeka), and computing services (e.g. Jupyter Notebook). There are many types of data collaborations. The above lists only a few.

    This session proposal calls for contributions to bring forward lessons learned from collaborative data projects and platforms, especially about those that involve multiple communities and/or across organizational boundaries. Presentations focusing on the following (non-exclusive) topics are sought after:

    1. Support mechanisms and governance structures for data collaborations across organizations/communities.

    2. Data policies --- such as data sharing agreements, memorandum of understanding, terms of use, privacy policies, etc. --- for facilitating collaborations across organizations/communities.

    3. Traditional and non-traditional funding sources for data collaborations across multiple parties; sustainability of data collaboration projects, platforms, and communities.

    4. Data workflows --- collection, processing, aggregation, archiving, and publishing, etc. --- designed with considerations of (external) collaboration.

    5. Collaborative web platforms for data acquisition, curation, analysis, visualization, and education.

    6. Examples and insights from data trusts, data coops, as well as other formal and informal forms of data stewardship.

    7. Debates on the pros and cons of centralized, distributed, and/or federated data services.

    8. Practical lessons learned from data collaboration stories: failure, success, incidence, unexpected turn of event, aftermath, etc. (no story is too small!).

  13. COVID-19 Case Surveillance Public Use Data

    • healthdata.gov
    • data.virginia.gov
    • +6more
    application/rdfxml +5
    Updated Feb 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cdc.gov (2021). COVID-19 Case Surveillance Public Use Data [Dataset]. https://healthdata.gov/w/knt4-7efa/default?cur=xbTVFQpGL_I
    Explore at:
    csv, json, application/rssxml, tsv, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Feb 25, 2021
    Dataset provided by
    data.cdc.gov
    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

    CDC has three COVID-19 case surveillance datasets:

    The following apply to all three datasets:

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and aut

  14. c

    A study into the effects of two Focus on Form interventions on the...

    • datacatalogue.cessda.eu
    • ssh.datastations.nl
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Boers-Visker (2023). A study into the effects of two Focus on Form interventions on the acquisition of the agreement verb modification (dataset) [Dataset]. http://doi.org/10.17026/dans-24h-xsp8
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Utrecht University of Applied Sciences / University of Amsterdam
    Authors
    E.M. Boers-Visker
    Description

    This dataset is the result of a study into the acquisition of spatial devices in two learners of Sign Language of the Netherlands (NGT). This study is one of the four studies carried out by Eveline Boers-Visker in the context of her doctoral research entitled ‘Learning to use space: a study into the SL2 acquisition process of adult learners of Sign Language of the Netherlands’ (2016-2020). For this particular study, four groups of learners took part in an intervention study. Two groups received an input flood and explicit instruction on the NGT agreement verb system (condition A), one group received an implicit input flood (condition B), and one group (C) served as control group. Four tests were conducted to measure the learners' knowledge of the agreement verb paradigm. This dataset contains (i) a document presenting the step-by-step coding process to arrive at a total score per response and (ii) four documents with the scores per participant.

  15. Z

    WageIndicator Collective Agreements Database Dataset with Full Texts and...

    • data.niaid.nih.gov
    • ssh.datastations.nl
    • +3more
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele Medas (2024). WageIndicator Collective Agreements Database Dataset with Full Texts and Selected Clauses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5651623
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Gabriele Medas
    Daniela Ceccon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Since 2012, the WageIndicator Foundation has maintained a Collective Agreements Database, where the texts of 1600 collective agreements (CBAs) from 61 countries and in 27 languages have been uploaded, coded and annotated. This database is a unique example at global level: collective agreements are documents containing conditions of employment that result from negotiations between independent unions and employers, and their content is often surrounded by an atmosphere of secrecy. Under the SSHOC project and with the support of the CLARIN Research Infrastructure, the agreements have been manually and automatically annotated on several levels: for each agreement, the team answers a series of questions and selects the appropriate piece of text (clause) for each.

    One of the results of the collective agreements' annotation process is the dataset which is available here and includes all the clauses selected for each variable (WageIndicator_CBADatabase_Selected_Clauses). The full collective agreements' texts are stored in another dataset, also available here (WageIndicator_CBADatabase_Full_Texts_211019). A codebook is also included (210125-wageindicator-cba-codebook.pdf).

  16. Content of Deep Trade Agreements

    • datasearch.gesis.org
    Updated Feb 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hofmann Claudia, Alberto Osnago and Michele Ruta, (2017). "Horizontal Depth: A New Database on the Content of Preferential Trade Agreements". Policy Research working paper; no. WPS 7981. Washington, D.C. : World Bank Group. (2020). Content of Deep Trade Agreements [Dataset]. https://datasearch.gesis.org/dataset/api_worldbank_org_v2_datacatalog-157
    Explore at:
    Dataset updated
    Feb 25, 2020
    Dataset provided by
    World Bankhttp://worldbank.org/
    Authors
    Hofmann Claudia, Alberto Osnago and Michele Ruta, (2017). "Horizontal Depth: A New Database on the Content of Preferential Trade Agreements". Policy Research working paper; no. WPS 7981. Washington, D.C. : World Bank Group.
    Description

    The dataset on the content of preferential trade agreements (PTAs) maps 52 provisions in 279 PTAs notified at WTO signed between 1958 and 2015. It also includes information about legal enforceability of each provision. The “Trade Agreements” file lists all the agreements available (279) with the coding of 52 provisions. The name and description of all variables is listed in the “read me” sheet. The “read me” sheet also explain the coding of legal enforceability. The “Bilateral Observations” file is a bilateral version of the dataset. Each observation is a country pair-year-agreement. Notice that some country-pairs appear multiple times in certain years if they have more than one agreement in force in that year. For example Angola and DRC in 2000 were in COMESA and SADC. The variables are the same as in the excel files. Important notice: The Bilateral Observations file excludes Partial Scope Agreements (PSA).

  17. f

    EULA for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...

    • figshare.com
    • data.4tu.nl
    pdf
    Updated Oct 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). EULA for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20016194.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 10, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is the End-User License Agreement associated with the ConfLab dataset.

    Dataset contains pseudonymized information. Users need to fill in the form at: https://doi.org/10.4121/20016194 and submit the form to SPCLabDatasets-insy@tudelft.nl, in order to get access to the dataset.

  18. Z

    DES370K

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregersen, Brent A (2021). DES370K [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5676265
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset provided by
    Siva. Karthik
    Palmo, Kim
    Decolvenaere, Elizabeth
    Gregersen, Brent A
    Li Je-Leun
    Bergdorf, Michael
    Klepeis. John L
    Donchev, Alexander G
    Hargus, Cory
    Taube, Andrew G
    Law, Ka-Hei
    McGibbon, Robert T
    Shaw, David E
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DESRES Data Sets (DES370K)

    Please see the original paper at https://doi.org/10.1038/s41597-021-00833-x for more information about this dataset.

    This package contains a datasets described by Donchev et al. [1]: DES370K, It is presented as a CSV (DES370K.csv) and .mol files (geometries//DES370K_.mol). Also included is a metadata file DES370K_meta.csv, which contains a set of long-form column descriptions replicating those in [1], as well as data types and units (when applicable) for each column.

    Manifest

    • DES370K.csv : Full dataset, containing interaction energies calculated using CCSD(T), MP2, HF, and SAPT0, as well as dimer geometries.

    • DES370K_meta.csv : Long-form descriptions of the columns in DES370K, as well as datatypes and units (when applicable) for each column

    • LICENSE.txt : License for using and redistributing the datasets provided.

    • README.md : This file.

    Loading the Datset

    The datasets are presented as CSVs as a compromise between human-readability, format uniformity, and parsing speed. While an almost uncountable number of packages exist to read CSV files, we recommend using the python data analysis

    References

    [1] A. G. Donchev, A. G. Taube, E. Decolvenaere, C. Hargus, R. T. McGibbon, K.-H. Law, B. A. Gregersen, J.-L. Li, K. Palmo, K. Siva, M. Bergdorf, J. L. Klepeis, and D. E. Shaw. "Quantum chemical benchmark database of dimer interaction energies at a “gold standard” level of accuracy"

    [2] R. T. McGibbon, A. G. Taube, A. G. Donchev, K. Siva, F. Fernandez, C. Hargus, K.-H. Law, J.L. Klepeis, and D. E. Shaw. "Improving the accuracy of Moller-Plesset perturbation theory with neural networks"

    [3] M. K. Kesharwani, A. Karton, N. Sylvetsky, J. M. L. Nitai. "The S66 non-covalent interactions benchmark reconsidered using explicitly correlated methods near the basis set limit."

    License

            DESRES DATA SETS LICENSE AGREEMENT
    
    
    Copyright 2020, D. E. Shaw Research. All rights reserved.
    
    
    Redistribution and use of electronic structure data released in the DESRES
    Data Sets (DES370K, DES15K, DES5M, DESS66, and DESS66x8) with or without
    modification, is permitted provided that the following conditions are met:
    
    
      * Redistributions of the data must retain the above copyright notice,
      this list of conditions, and the following disclaimer.
    
    
      * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions, and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    
    
    Neither the name of D. E. Shaw Research nor the names of its contributors may
    be used to endorse or promote products derived from this software without
    specific prior written permission.
    
    
    THIS SOFTWARE AND DATA ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
    THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
    ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
    FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
    DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
    OR TORT (INCLUDINGNEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE AND/OR DATA, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    
  19. w

    Dataset of books called AJ contracts guide to: ASCA Form of Building...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called AJ contracts guide to: ASCA Form of Building Agreement 1982, second edition 1984, BPF/ACA Form of Building Agreement 1984 [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=AJ+contracts+guide+to%3A+ASCA+Form+of+Building+Agreement+1982%2C+second+edition+1984%2C+BPF%2FACA+Form+of+Building+Agreement+1984
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is AJ contracts guide to: ASCA Form of Building Agreement 1982, second edition 1984, BPF/ACA Form of Building Agreement 1984. It features 7 columns including author, publication date, language, and book publisher.

  20. POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Sebastian Sewerin; Sebastian Sebastian Sewerin; Lynn Helena Lynn H. Kaack; Lynn Helena Lynn H. Kaack; Joel Küttel; Joel Küttel; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner (2023). POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design through text-as-data approaches [Dataset]. http://doi.org/10.5281/zenodo.8284380
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Sebastian Sewerin; Sebastian Sebastian Sewerin; Lynn Helena Lynn H. Kaack; Lynn Helena Lynn H. Kaack; Joel Küttel; Joel Küttel; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The POLIANNA dataset is a collection of legislative texts from the European Union (EU) that have been annotated based on theoretical concepts of policy design. The dataset consists of 20,577 annotated spans in 412 articles, drawn from 18 EU climate change mitigation and renewable energy laws, and can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset includes a novel coding scheme for annotating text spans, and you find a description of the annotated corpus, an analysis of inter-annotator agreement, and a discussion of potential applications in the paper accompanying this dataset. The objective of this dataset to build tools that assist with manual coding of policy texts by automatically identifying relevant paragraphs.

    Detailed instructions and further guidance about the dataset as well as all the code used for this project can be found in the accompanying paper and on the GitHub project page. The repository also contains useful code to calculate various inter-annotator agreement measures and can be used to process text annotations generated by INCEpTION.

    Dataset Description

    We provide the dataset in 3 different formats:

    JSON: Each article corresponds to a folder, where the Tokens and Spans are stored in a separate JSON file. Each article-folder further contains the raw policy-text as in a text file and the metadata about the policy. This is the most human-readable format.

    JSONL: Same folder structure as the JSON format, but the Spans and Tokens are stored in a JSONL file, where each line is a valid JSON document.

    Pickle: We provide the dataset as a Python object. This is the recommended method when using our own Python framework that is provided on GitHub. For more information, check out the GitHub project page.


    License

    The POLIANNA dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. If you use the POLIANNA dataset in your research in any form, please cite the dataset.

    Citation

    Sewerin, S., Kaack, L.H., Küttel, J. et al. Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset. Sci Data10, 896 (2023). https://doi.org/10.1038/s41597-023-02801-z

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). Atticus Open Contract Dataset (AOK) (beta) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7648

Atticus Open Contract Dataset (AOK) (beta)

Explore at:
csvAvailable download formats
Dataset updated
Jun 22, 2023
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7

Check out our website at atticusprojectai.org.

Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826

Search
Clear search
Close search
Google apps
Main menu