100+ datasets found

E
Atticus Open Contract Dataset (AOK) (beta)
live.european-language-grid.eu
explore.openaire.eu
+1more
csv
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Atticus Open Contract Dataset (AOK) (beta) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7648
Explore at:
csvAvailable download formats
Dataset updated
Jun 22, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7
Check out our website at atticusprojectai.org.
Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826
f
ALeaseBert
uvaauas.figshare.com
html
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Rossi; E. Kanoulas; S. Leivaditi (2023). ALeaseBert [Dataset]. http://doi.org/10.21942/uva.19732993.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.21942/uva.19732993.v1
Dataset updated
May 30, 2023
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
J. Rossi; E. Kanoulas; S. Leivaditi
License
http://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
Description
DATA This is the data from the paper "A Benchmark for Lease Contract Review" (https://arxiv.org/abs/2010.10386).

The weights of our ALeaseBERT model (ALeaseBert.zip) The dataset of lease contracts and its annotations (annotated_dataset.zip) Samples: sample.html is a contract, sample.json has the corresponding annotations Metadata: annotations-legend.json has the dictionary of annotated entities

LICENSE This data is made available under the terms of CC BY-NC 4.0

See http://creativecommons.org/licenses/by-nc/4.0/deed.en See http://creativecommons.org/licenses/by-nc/4.0/legalcode
P
Contract Discovery Dataset
paperswithcode.com
opendatalab.com
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Łukasz Borchmann; Dawid Wiśniewski; Andrzej Gretkowski; Izabela Kosmala; Dawid Jurkiewicz; Łukasz Szałkiewicz; Gabriela Pałka; Karol Kaczmarek; Agnieszka Kaliska; Filip Graliński (2022). Contract Discovery Dataset [Dataset]. https://paperswithcode.com/dataset/contract-discovery
Explore at:
Dataset updated
Oct 16, 2022
Authors
Łukasz Borchmann; Dawid Wiśniewski; Andrzej Gretkowski; Izabela Kosmala; Dawid Jurkiewicz; Łukasz Szałkiewicz; Gabriela Pałka; Karol Kaczmarek; Agnieszka Kaliska; Filip Graliński
Description
A new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts.
P
Terms of Service Dataset
paperswithcode.com
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Lippi; Przemyslaw Palka; Giuseppe Contissa; Francesca Lagioia; Hans-Wolfgang Micklitz; Giovanni Sartor; Paolo Torroni (2024). Terms of Service Dataset [Dataset]. https://paperswithcode.com/dataset/terms-of-service
Explore at:
Dataset updated
Feb 21, 2024
Authors
Marco Lippi; Przemyslaw Palka; Giuseppe Contissa; Francesca Lagioia; Hans-Wolfgang Micklitz; Giovanni Sartor; Paolo Torroni
Description
The Terms of Service dataset is a law dataset corresponding to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.
h
kl3m-data-edgar-agreements-sample
huggingface.co
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ALEA Institute (2025). kl3m-data-edgar-agreements-sample [Dataset]. https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements-sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2025
Authors
ALEA Institute
Description
KL3M Data Project

Note: This page provides general information about the KL3M Data Project. Additional details specific to this dataset will be added in future updates. For complete information, please visit the GitHub repository or refer to the KL3M Data Project paper.

Description

This dataset is part of the ALEA Institute's KL3M Data Project, which provides copyright-clean training resources for large language models.

Dataset Details

Format: Parquet… See the full description on the dataset page: https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements-sample.
d
Data from: Purchase Orders and Contracts
catalog.data.gov
data.brla.gov
+1more
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.brla.gov (2025). Purchase Orders and Contracts [Dataset]. https://catalog.data.gov/dataset/purchase-orders-and-contracts
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.brla.gov
Description
Listing of all purchase orders and contracts issued to procure goods and/or services within City-Parish. In the City-Parish, a PO/Contract is made up of two components: a header and one or many detail items that comprise the overarching PO/Contract. The header contains information that pertains to the entire PO/Contract. This includes, but is not limited to, the total amount of the PO/Contract, the department requesting the purchase and the vendor providing the goods or services. The detail item(s) contain information that is specific to the individual item ordered or service procured through the PO/Contract. The item/service description, item/service quantity and the cost of the item is located within the PO/Contract details. There may be one or many detail items on an individual PO/Contract. For example, a Purchase Order for a computer equipment may include three items: the computer, the monitor and the base software package. Both header information and detail item information are included in this dataset in order to provide a comprehensive view of the PO/Contract data. The Record Type field indicates whether the record is a header record (H) or detail item record (D). In the computer purchase example from above, the system would display 4 records – one header record and 3 detail item records. It should be noted header information will be duplicated on all detail items. No detail item information will be displayed on the header record. ***In October of 2017, the City-Parish switched to a new system used to track PO/Contracts. This data contains all PO/Contracts entered in or after October 2017. For prior year data, please see the Legacy Purchase Order dataset https://data.brla.gov/Government/Legacy-Purchase-Orders/54bn-2sqf
d
OCP Procurement Agreements
data.detroitmi.gov
detroitdata.org
+3more
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Detroit (2019). OCP Procurement Agreements [Dataset]. https://data.detroitmi.gov/datasets/ocp-procurement-agreements/explore
Explore at:
Dataset updated
Dec 12, 2019
Dataset authored and provided by
City of Detroit
Description
The Procurement Agreements dataset provides details about contract agreements between the City of Detroit and suppliers who provide materials, equipment and services to the City. Initial and amended contracts and purchase orders associated with the contracts are included in the dataset, In some cases, purchase orders are generated to pay suppliers for work completed under a contract. If available, a link to the contract agreement document in PDF format is provided in the 'Contract Link' field of each record (row) in the dataset. This dataset is updated weekly with data from the Office of Contracting and Procurement (OCP).
n
FOI 26605 - Datasets - Open Data Portal
opendata.nhsbsa.net
Updated Nov 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). FOI 26605 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-26605
Explore at:
Dataset updated
Nov 24, 2022
License
Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Description
Further to the original Enterprise Application request, the contract below has expired. Please provide the current status. Finance Capita CRM Trustmarque Solutions Ltd I'd like to apologise for the length of this request, and how tedious it may be to handle. That being said, please make an effort to provide all of this information. The information I'm requesting is regarding the software contracts that the organisation uses, for the following fields.Enterprise Resource Planning Software Solution (ERP): Primary Customer Relationship Management Solution (CRM): For example, Salesforce, Lagan CRM, Microsoft Dynamics; software of this nature. Primary Human Resources (HR) and Payroll Software Solution: For example, iTrent, ResourceLink, HealthRoster; software of this nature. The organisation’s primary corporate Finance Software Solution: For example, Agresso, Integra, Sapphire Systems; software of this nature. Name of Supplier: Can you please provide me with the software provider for each contract? The brand of the software: Can you please provide me with the actual name of the software. Please do not provide me with the supplier name again please provide me with the actual software name. Description of the contract: Can you please provide me with detailed information about this contract and please state if upgrade, maintenance and support is included. Please also list the software modules included in these contracts. Number of Users/Licenses: What is the total number of user/licenses for this contract? Annual Spend: What is the annual average spend for each contract? Contract Duration: What is the duration of the contract please include any available extensions within the contract. Contract Start Date: What is the start date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Expiry: What is the expiry date of this contract? Please include month and year of the contract. DD-MM-YY or MM-YY. Contract Review Date: What is the review date of this contract? Please include month and year of the contract. If this cannot be provide please provide me estimates of when the contract is likely to be reviewed. DD-MM-YY or MM-YY. Contact Details: I require the full contact details of the person within the organisation responsible for this particular software contract (name, job title, email, contact number).’
National Inpatient Sample (NIS) - Restricted Access Files
catalog.data.gov
data.virginia.gov
+2more
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). National Inpatient Sample (NIS) - Restricted Access Files [Dataset]. https://catalog.data.gov/dataset/hcup-national-nationwide-inpatient-sample-nis-restricted-access-file
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
The Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) is the largest publicly available all-payer inpatient care database in the United States. The NIS is designed to produce U.S. regional and national estimates of inpatient utilization, access, cost, quality, and outcomes. Unweighted, it contains data from more than 7 million hospital stays each year. Weighted, it estimates more than 35 million hospitalizations nationally. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality (AHRQ), HCUP data inform decision making at the national, State, and community levels. Starting with the 2012 data year, the NIS is a sample of discharges from all hospitals participating in HCUP, covering more than 97 percent of the U.S. population. For prior years, the NIS was a sample of hospitals. The NIS allows for weighted national estimates to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. The NIS's large sample size enables analyses of rare conditions, such as congenital anomalies; uncommon treatments, such as organ transplantation; and special patient populations, such as the uninsured. NIS data are available since 1988, allowing analysis of trends over time. The NIS inpatient data include clinical and resource use information typically available from discharge abstracts with safeguards to protect the privacy of individual patients, physicians, and hospitals (as required by data sources). Data elements include but are not limited to: diagnoses, procedures, discharge status, patient demographics (e.g., sex, age), total charges, length of stay, and expected payment source, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge’. The NIS excludes data elements that could directly or indirectly identify individuals. Restricted access data files are available with a data use agreement and brief online security training.
o
Service Agreement on Storage and Dissemination of Research Data /...
explore.openaire.eu
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swedish National Data Service (2024). Service Agreement on Storage and Dissemination of Research Data / Uppdragsavtal avseende lagring och förmedling av forskningsdata [Dataset]. http://doi.org/10.5281/zenodo.11278410
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11278410
Dataset updated
May 24, 2024
Authors
Swedish National Data Service
Description
Service Agreement on Storage and Dissemination of Research Data A service agreement for the storage and dissemination of research data is required to use SND's repository (SND CARE) for research data. The agreement details the legal prerequisites for the use of SND CARE and the various commitments and responsibilities of SND and the research principal, respectively. Members of the SND Network are advised to sign a general service agreement for the use of SND CARE. For individual researchers who do not belong to a research principal that is a member of the SND Network, or if the research principal has not signed a general service agreement with SND, an agreement is signed digitally in SND’s system for describing and sharing data (DORIS) per submitted dataset. This document is the agreement template. The terms of the contract may vary slightly for individual parties. The agreement is translated into English, but for reading only. The Swedish version is the official agreement and the one which is to be signed. Uppdragsavtal avseende lagring och förmedling För att använda SND:s repositorium (SND CARE) för forskningsdata tecknas ett Uppdragsavtal avseende lagring och förmedling av forskningsdata med SND. Avtalet redogör bland annat för de rättsliga förutsättningarna för användandet av SND CARE, samt innehåller de olika åtaganden och ansvar som åligger SND respektive forskningshuvudmannen. Medlemmar i SND-nätverket rekommenderas att teckna ett generellt uppdragsavtal för användning av SND CARE. För enskilda forskare som inte tillhör en forskningshuvudman som är medlem i SND:s nätverk, eller om forskningshuvudmannen inte har tecknat ett generellt uppdragsavtal med SND, signeras ett avtal per inlämnat dataset digitalt i DORIS (SND:s system för att beskriva och dela data). Det som delas här är den avtalsmall som SND använder. Avtalsvillkoren kan variera något med enskilda parter. Avtalet finns översatt till engelska, men enbart för läsning. Det är den svenska versionen av avtalet som är den formella och som signeras.
P
Merger Agreement Understanding Dataset (MAUD) Dataset
paperswithcode.com
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks (2023). Merger Agreement Understanding Dataset (MAUD) Dataset [Dataset]. https://paperswithcode.com/dataset/merger-agreement-understanding-dataset-maud
Explore at:
Dataset updated
Jan 1, 2023
Authors
Steven H. Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dimitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks
Description
MAUD is an expert-annotated merger agreement reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points study, where lawyers and law students answered 92 questions about 152 merger agreements.

With over 39,000 examples and 47,000 total annotations, it is the largest expert-annotated legal reading comprehension dataset in the English language, as well as the first expert-annotated merger agreement dataset.
d
Data Collaborations Across Boundaries (Slides)
data.depositar.io
pdf
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
depositar (2025). Data Collaborations Across Boundaries (Slides) [Dataset]. https://data.depositar.io/dataset/data-collaborations-across-boundaries
Explore at:
pdf(4440122), pdf(10713394), pdf(1792282), pdf(1296859), pdf(3112569)Available download formats
Dataset updated
Jun 27, 2025
Dataset provided by
depositar
Description
This dataset collects the slides that were presented at the Data Collaborations Across Boundaries session in SciDataCon 2022, part of the International Data Week.

The following session proposal was prepared by Tyng-Ruey Chuang and submitted to SciDataCon 2022 organizers for consideration on 2022-02-28. The proposal was accepted on 2022-03-28. Six abstracts were submitted and accepted to this session. Five presentations were delivered online in a virtual session on 2022-06-21.

Data Collaborations Across Boundaries

There are many good stories about data collaborations across boundaries. We need more. We also need to share the lessons each of us has learned from collaborating with parties and communities not in our familiar circles.

By boundaries, we mean not just the regulatory borders in between the nation states about data sharing but the various barriers, readily conceivable or not, that hinder collaboration in aggregating, sharing, and reusing data for social good. These barriers to collaboration exist between the academic disciplines, between the economic players, and between the many user communities, just to name a few. There are also cross-domain barriers, for example those that lay among data practitioners, public administrators, and policy makers when they are articulating the why, what, and how of "open data" and debating its economic significance and fair distribution. This session aims to bring together experiences and thoughts on good data practices in facilitating collaborations across boundaries and domains.

The success of Wikipedia proves that collaborative content production and service, by ways of copyleft licenses, can be sustainable when coordinated by a non-profit and funded by the general public. Collaborative code repositories like GitHub and GitLab demonstrate the enormous value and mass scale of systems-facilitated integration of user contributions that run across multiple programming languages and developer communities. Research data aggregators and repositories such as GBIF, GISAID, and Zenodo have served numerous researchers across academic disciplines. Citizen science projects and platforms, for instance eBird, Galaxy Zoo, and Taiwan Roadkill Observation Network (TaiRON), not only collect data from diverse communities but also manage and release datasets for research use and public benefit (e.g. TaiRON datasets being used to improve road design and reduce animal mortality). At the same time large scale data collaborations depend on standards, protocols, and tools for building registries (e.g. Archival Resource Key), ontologies (e.g. Wikidata and schema.org), repositories (e.g. CKAN and Omeka), and computing services (e.g. Jupyter Notebook). There are many types of data collaborations. The above lists only a few.

This session proposal calls for contributions to bring forward lessons learned from collaborative data projects and platforms, especially about those that involve multiple communities and/or across organizational boundaries. Presentations focusing on the following (non-exclusive) topics are sought after:

Support mechanisms and governance structures for data collaborations across organizations/communities.

Data policies --- such as data sharing agreements, memorandum of understanding, terms of use, privacy policies, etc. --- for facilitating collaborations across organizations/communities.

Traditional and non-traditional funding sources for data collaborations across multiple parties; sustainability of data collaboration projects, platforms, and communities.

Data workflows --- collection, processing, aggregation, archiving, and publishing, etc. --- designed with considerations of (external) collaboration.

Collaborative web platforms for data acquisition, curation, analysis, visualization, and education.

Examples and insights from data trusts, data coops, as well as other formal and informal forms of data stewardship.

Debates on the pros and cons of centralized, distributed, and/or federated data services.

Practical lessons learned from data collaboration stories: failure, success, incidence, unexpected turn of event, aftermath, etc. (no story is too small!).
COVID-19 Case Surveillance Public Use Data
healthdata.gov
data.virginia.gov
+6more
application/rdfxml +5
Updated Feb 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cdc.gov (2021). COVID-19 Case Surveillance Public Use Data [Dataset]. https://healthdata.gov/w/knt4-7efa/default?cur=xbTVFQpGL_I
Explore at:
csv, json, application/rssxml, tsv, application/rdfxml, xmlAvailable download formats
Dataset updated
Feb 25, 2021
Dataset provided by
data.cdc.gov
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and aut
c
A study into the effects of two Focus on Form interventions on the...
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Boers-Visker (2023). A study into the effects of two Focus on Form interventions on the acquisition of the agreement verb modification (dataset) [Dataset]. http://doi.org/10.17026/dans-24h-xsp8
Explore at:
Unique identifier
https://doi.org/10.17026/dans-24h-xsp8
Dataset updated
Apr 11, 2023
Dataset provided by
Utrecht University of Applied Sciences / University of Amsterdam
Authors
E.M. Boers-Visker
Description
This dataset is the result of a study into the acquisition of spatial devices in two learners of Sign Language of the Netherlands (NGT). This study is one of the four studies carried out by Eveline Boers-Visker in the context of her doctoral research entitled ‘Learning to use space: a study into the SL2 acquisition process of adult learners of Sign Language of the Netherlands’ (2016-2020). For this particular study, four groups of learners took part in an intervention study. Two groups received an input flood and explicit instruction on the NGT agreement verb system (condition A), one group received an implicit input flood (condition B), and one group (C) served as control group. Four tests were conducted to measure the learners' knowledge of the agreement verb paradigm. This dataset contains (i) a document presenting the step-by-step coding process to arrive at a total score per response and (ii) four documents with the scores per participant.
Z
WageIndicator Collective Agreements Database Dataset with Full Texts and...
data.niaid.nih.gov
ssh.datastations.nl
+3more
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriele Medas (2024). WageIndicator Collective Agreements Database Dataset with Full Texts and Selected Clauses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5651623
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Gabriele Medas
Daniela Ceccon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Since 2012, the WageIndicator Foundation has maintained a Collective Agreements Database, where the texts of 1600 collective agreements (CBAs) from 61 countries and in 27 languages have been uploaded, coded and annotated. This database is a unique example at global level: collective agreements are documents containing conditions of employment that result from negotiations between independent unions and employers, and their content is often surrounded by an atmosphere of secrecy. Under the SSHOC project and with the support of the CLARIN Research Infrastructure, the agreements have been manually and automatically annotated on several levels: for each agreement, the team answers a series of questions and selects the appropriate piece of text (clause) for each.

One of the results of the collective agreements' annotation process is the dataset which is available here and includes all the clauses selected for each variable (WageIndicator_CBADatabase_Selected_Clauses). The full collective agreements' texts are stored in another dataset, also available here (WageIndicator_CBADatabase_Full_Texts_211019). A codebook is also included (210125-wageindicator-cba-codebook.pdf).
Content of Deep Trade Agreements
datasearch.gesis.org
Updated Feb 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hofmann Claudia, Alberto Osnago and Michele Ruta, (2017). "Horizontal Depth: A New Database on the Content of Preferential Trade Agreements". Policy Research working paper; no. WPS 7981. Washington, D.C. : World Bank Group. (2020). Content of Deep Trade Agreements [Dataset]. https://datasearch.gesis.org/dataset/api_worldbank_org_v2_datacatalog-157
Explore at:
Dataset updated
Feb 25, 2020
Dataset provided by
World Bankhttp://worldbank.org/
Authors
Hofmann Claudia, Alberto Osnago and Michele Ruta, (2017). "Horizontal Depth: A New Database on the Content of Preferential Trade Agreements". Policy Research working paper; no. WPS 7981. Washington, D.C. : World Bank Group.
Description
The dataset on the content of preferential trade agreements (PTAs) maps 52 provisions in 279 PTAs notified at WTO signed between 1958 and 2015. It also includes information about legal enforceability of each provision. The “Trade Agreements” file lists all the agreements available (279) with the coding of 52 provisions. The name and description of all variables is listed in the “read me” sheet. The “read me” sheet also explain the coding of legal enforceability. The “Bilateral Observations” file is a bilateral version of the dataset. Each observation is a country pair-year-agreement. Notice that some country-pairs appear multiple times in certain years if they have more than one agreement in force in that year. For example Angola and DRC in 2000 were in COMESA and SADC. The variables are the same as in the excel files. Important notice: The Bilateral Observations file excludes Partial Scope Agreements (PSA).
f
EULA for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...
figshare.com
data.4tu.nl
pdf
Updated Oct 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). EULA for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20016194.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.4121/20016194.v2
Dataset updated
Oct 10, 2022
Dataset provided by
4TU.ResearchData
Authors
Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is the End-User License Agreement associated with the ConfLab dataset.

Dataset contains pseudonymized information. Users need to fill in the form at: https://doi.org/10.4121/20016194 and submit the form to SPCLabDatasets-insy@tudelft.nl, in order to get access to the dataset.
Z
DES370K
data.niaid.nih.gov
zenodo.org
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregersen, Brent A (2021). DES370K [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5676265
Explore at:
Dataset updated
Nov 12, 2021
Dataset provided by
Siva. Karthik
Palmo, Kim
Decolvenaere, Elizabeth
Gregersen, Brent A
Li Je-Leun
Bergdorf, Michael
Klepeis. John L
Donchev, Alexander G
Hargus, Cory
Taube, Andrew G
Law, Ka-Hei
McGibbon, Robert T
Shaw, David E
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DESRES Data Sets (DES370K)

Please see the original paper at https://doi.org/10.1038/s41597-021-00833-x for more information about this dataset.

This package contains a datasets described by Donchev et al. [1]: DES370K, It is presented as a CSV (DES370K.csv) and .mol files (geometries//DES370K_.mol). Also included is a metadata file DES370K_meta.csv, which contains a set of long-form column descriptions replicating those in [1], as well as data types and units (when applicable) for each column.

Manifest

DES370K.csv : Full dataset, containing interaction energies calculated using CCSD(T), MP2, HF, and SAPT0, as well as dimer geometries.

DES370K_meta.csv : Long-form descriptions of the columns in DES370K, as well as datatypes and units (when applicable) for each column

LICENSE.txt : License for using and redistributing the datasets provided.

README.md : This file.

Loading the Datset

The datasets are presented as CSVs as a compromise between human-readability, format uniformity, and parsing speed. While an almost uncountable number of packages exist to read CSV files, we recommend using the python data analysis

References

[1] A. G. Donchev, A. G. Taube, E. Decolvenaere, C. Hargus, R. T. McGibbon, K.-H. Law, B. A. Gregersen, J.-L. Li, K. Palmo, K. Siva, M. Bergdorf, J. L. Klepeis, and D. E. Shaw. "Quantum chemical benchmark database of dimer interaction energies at a “gold standard” level of accuracy"

[2] R. T. McGibbon, A. G. Taube, A. G. Donchev, K. Siva, F. Fernandez, C. Hargus, K.-H. Law, J.L. Klepeis, and D. E. Shaw. "Improving the accuracy of Moller-Plesset perturbation theory with neural networks"

[3] M. K. Kesharwani, A. Karton, N. Sylvetsky, J. M. L. Nitai. "The S66 non-covalent interactions benchmark reconsidered using explicitly correlated methods near the basis set limit."

License

DESRES DATA SETS LICENSE AGREEMENT Copyright 2020, D. E. Shaw Research. All rights reserved. Redistribution and use of electronic structure data released in the DESRES Data Sets (DES370K, DES15K, DES5M, DESS66, and DESS66x8) with or without modification, is permitted provided that the following conditions are met: * Redistributions of the data must retain the above copyright notice, this list of conditions, and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of D. E. Shaw Research nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE AND DATA ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDINGNEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE AND/OR DATA, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
w
Dataset of books called AJ contracts guide to: ASCA Form of Building...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called AJ contracts guide to: ASCA Form of Building Agreement 1982, second edition 1984, BPF/ACA Form of Building Agreement 1984 [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=AJ+contracts+guide+to%3A+ASCA+Form+of+Building+Agreement+1982%2C+second+edition+1984%2C+BPF%2FACA+Form+of+Building+Agreement+1984
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is AJ contracts guide to: ASCA Form of Building Agreement 1982, second edition 1984, BPF/ACA Form of Building Agreement 1984. It features 7 columns including author, publication date, language, and book publisher.
POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Sebastian Sewerin; Sebastian Sebastian Sewerin; Lynn Helena Lynn H. Kaack; Lynn Helena Lynn H. Kaack; Joel Küttel; Joel Küttel; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner (2023). POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design through text-as-data approaches [Dataset]. http://doi.org/10.5281/zenodo.8284380
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8284380
Dataset updated
Dec 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Sebastian Sewerin; Sebastian Sebastian Sewerin; Lynn Helena Lynn H. Kaack; Lynn Helena Lynn H. Kaack; Joel Küttel; Joel Küttel; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner; Fride Sigurdsson; Onerva Martikainen; Alisha Esshaki; Fabian Hafner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The POLIANNA dataset is a collection of legislative texts from the European Union (EU) that have been annotated based on theoretical concepts of policy design. The dataset consists of 20,577 annotated spans in 412 articles, drawn from 18 EU climate change mitigation and renewable energy laws, and can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset includes a novel coding scheme for annotating text spans, and you find a description of the annotated corpus, an analysis of inter-annotator agreement, and a discussion of potential applications in the paper accompanying this dataset. The objective of this dataset to build tools that assist with manual coding of policy texts by automatically identifying relevant paragraphs.

Detailed instructions and further guidance about the dataset as well as all the code used for this project can be found in the accompanying paper and on the GitHub project page. The repository also contains useful code to calculate various inter-annotator agreement measures and can be used to process text annotations generated by INCEpTION.

Dataset Description

We provide the dataset in 3 different formats:

JSON: Each article corresponds to a folder, where the Tokens and Spans are stored in a separate JSON file. Each article-folder further contains the raw policy-text as in a text file and the metadata about the policy. This is the most human-readable format.

JSONL: Same folder structure as the JSON format, but the Spans and Tokens are stored in a JSONL file, where each line is a valid JSON document.

Pickle: We provide the dataset as a Python object. This is the recommended method when using our own Python framework that is provided on GitHub. For more information, check out the GitHub project page.

License

The POLIANNA dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. If you use the POLIANNA dataset in your research in any form, please cite the dataset.

Citation

Sewerin, S., Kaack, L.H., Küttel, J. et al. Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset. Sci Data10, 896 (2023). https://doi.org/10.1038/s41597-023-02801-z

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). Atticus Open Contract Dataset (AOK) (beta) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7648

Atticus Open Contract Dataset (AOK) (beta)

Explore at:

csvAvailable download formats

Dataset updated

Jun 22, 2023

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7

Check out our website at atticusprojectai.org.

Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826

Clear search

Close search

Google apps

Main menu

Atticus Open Contract Dataset (AOK) (beta)

ALeaseBert

Contract Discovery Dataset

Terms of Service Dataset

kl3m-data-edgar-agreements-sample

Data from: Purchase Orders and Contracts

OCP Procurement Agreements

FOI 26605 - Datasets - Open Data Portal

National Inpatient Sample (NIS) - Restricted Access Files

Service Agreement on Storage and Dissemination of Research Data /...

Merger Agreement Understanding Dataset (MAUD) Dataset

Data Collaborations Across Boundaries (Slides)

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

A study into the effects of two Focus on Form interventions on the...

WageIndicator Collective Agreements Database Dataset with Full Texts and...

Content of Deep Trade Agreements

EULA for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...

DES370K

DESRES Data Sets (DES370K)

Manifest

Loading the Datset

References

License

Dataset of books called AJ contracts guide to: ASCA Form of Building...

POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design...

Atticus Open Contract Dataset (AOK) (beta)