The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It provides senior management an overall view of existing data sharing agreements; fosters productive sharing of health information with VHA's external partners; and streamlines data acquisition to improve data management responsibilities overall. Agreements that VHA has established with entities within the VA are not candidates for this Repository.
This dataset collects the slides that were presented at the Data Collaborations Across Boundaries session in SciDataCon 2022, part of the International Data Week.
The following session proposal was prepared by Tyng-Ruey Chuang and submitted to SciDataCon 2022 organizers for consideration on 2022-02-28. The proposal was accepted on 2022-03-28. Six abstracts were submitted and accepted to this session. Five presentations were delivered online in a virtual session on 2022-06-21.
Data Collaborations Across Boundaries
There are many good stories about data collaborations across boundaries. We need more. We also need to share the lessons each of us has learned from collaborating with parties and communities not in our familiar circles.
By boundaries, we mean not just the regulatory borders in between the nation states about data sharing but the various barriers, readily conceivable or not, that hinder collaboration in aggregating, sharing, and reusing data for social good. These barriers to collaboration exist between the academic disciplines, between the economic players, and between the many user communities, just to name a few. There are also cross-domain barriers, for example those that lay among data practitioners, public administrators, and policy makers when they are articulating the why, what, and how of "open data" and debating its economic significance and fair distribution. This session aims to bring together experiences and thoughts on good data practices in facilitating collaborations across boundaries and domains.
The success of Wikipedia proves that collaborative content production and service, by ways of copyleft licenses, can be sustainable when coordinated by a non-profit and funded by the general public. Collaborative code repositories like GitHub and GitLab demonstrate the enormous value and mass scale of systems-facilitated integration of user contributions that run across multiple programming languages and developer communities. Research data aggregators and repositories such as GBIF, GISAID, and Zenodo have served numerous researchers across academic disciplines. Citizen science projects and platforms, for instance eBird, Galaxy Zoo, and Taiwan Roadkill Observation Network (TaiRON), not only collect data from diverse communities but also manage and release datasets for research use and public benefit (e.g. TaiRON datasets being used to improve road design and reduce animal mortality). At the same time large scale data collaborations depend on standards, protocols, and tools for building registries (e.g. Archival Resource Key), ontologies (e.g. Wikidata and schema.org), repositories (e.g. CKAN and Omeka), and computing services (e.g. Jupyter Notebook). There are many types of data collaborations. The above lists only a few.
This session proposal calls for contributions to bring forward lessons learned from collaborative data projects and platforms, especially about those that involve multiple communities and/or across organizational boundaries. Presentations focusing on the following (non-exclusive) topics are sought after:
Support mechanisms and governance structures for data collaborations across organizations/communities.
Data policies --- such as data sharing agreements, memorandum of understanding, terms of use, privacy policies, etc. --- for facilitating collaborations across organizations/communities.
Traditional and non-traditional funding sources for data collaborations across multiple parties; sustainability of data collaboration projects, platforms, and communities.
Data workflows --- collection, processing, aggregation, archiving, and publishing, etc. --- designed with considerations of (external) collaboration.
Collaborative web platforms for data acquisition, curation, analysis, visualization, and education.
Examples and insights from data trusts, data coops, as well as other formal and informal forms of data stewardship.
Debates on the pros and cons of centralized, distributed, and/or federated data services.
Practical lessons learned from data collaboration stories: failure, success, incidence, unexpected turn of event, aftermath, etc. (no story is too small!).
KL3M Data Project
Note: This page provides general information about the KL3M Data Project. Additional details specific to this dataset will be added in future updates. For complete information, please visit the GitHub repository or refer to the KL3M Data Project paper.
Description
This dataset is part of the ALEA Institute's KL3M Data Project, which provides copyright-clean training resources for large language models.
Dataset Details
Format: Parquet… See the full description on the dataset page: https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements-sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code belonging to the manuscript:
Tracking transformative agreements through open metadata: method and validation using Dutch Research Council NWO funded papers
Abstract
Transformative agreements have become an important strategy in the transition to open access, with almost 1,200 such agreements registered by 2025. Despite their prevalence, these agreements suffer from important transparency limitations, most notably article-level metadata indicating which articles are covered by these agreements. Typically, this data is available to libraries but not openly shared, making it difficult to study the impact of these agreements. In this paper, we present a novel, open, replicable method for analyzing transformative agreements using open metadata, specifically the Journal Checker tool provided by cOAlition S and OpenAlex. To demonstrate its potential, we apply our approach to a subset of publications funded by the Dutch Research Council (NWO) and its health research counterpart ZonMw. In addition, the results of this open method are compared with the actual publisher data reported to the Dutch university library consortium UKB. This validation shows that this open method accurately identified 89% of the publications covered by transformative agreements, while the 11% false positives shed an interesting light on the limitations of this method. In the absence of hard, openly available article-level data on transformative agreements, we provide researchers and institutions with a powerful tool to critically track and evaluate the impact of these agreements.
This dataset contains the following files:
This dataset contains all of the current parcels that are currently under an Open Space Use Agreement between the owners of the parcel and the County of Albemarle. These agreements limit construction and development activity on the property owner's land, and lasts from 4 to 10 years. For more information on any particular agreement, contact the Real Estate division of the County of Albemarle's Finance Department.
This is a graphical polygon dataset depicting the location of the City’s Interlocal agreements.
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
understanding agreements
Prepaid account agreement data, which contain general terms and conditions, pricing, and fee information, that issuers submit to the Bureau under the terms of the Prepaid Rule. Data is refreshed nightly.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied.
This dataset reflects the boundaries of those Indigenous Land Use Agreements (ILUA) that have entered the notification process or have been registered and placed on the Register of Indigenous Land Use Agreements (s199A, Native Title Act; Commonwealth). This is a national dataset. Spatial attribution includes National Native Title Tribunal number, Name, Agreement Type, Proponent, Area and Registration Date. Products using this data should acknowledge the National Native Title Tribunal as the data source.
Lineage:
Created by the National Native Title Tribunal in 1998 and continuously updated and maintained.
Positional accuracy:
0.1 m
Attribute accuracy:
Attributes are maintained continuously and should at all times reflect the primary detail as contained within the Register of ILUA's.
Logical Consistency:
Technical or unintentional overlaps between boundaries may arise within this dataset. Technical overlaps include portions of boundaries of determinations that are intended to abut but which overlap. These overlaps may be caused by changes in source datasets used to create initial application boundaries or by differing interpretations of determination descriptions. Part of the maintenance program of this dataset is the identification and removal of such technical overlaps.
Completeness:
Ongoing
https://data.gov.au/data/dataset/eb8caa51-a883-4e87-907d-fea1a4a054f1
National Native Title Tribunal (2011) Registered and Notified Indigenous Land Use Agreements (ILUA) - agreement boundaries and core attributes about agreement - 01/11/2011. Bioregional Assessment Source Dataset. Viewed 05 July 2017, http://data.bioregionalassessments.gov.au/dataset/91df8ee7-423c-4a30-ae15-aa4c08a49bb9.
The County is a party to various credit agreements, including short term notes, Direct Pay variable rate agreements , Direct Placement variable rate agreements, and an operating Line of Credit. Current credit agreements that the county is a party to are made available below.
Beginning March 1, 2022, the "COVID-19 Case Surveillance Public Use Data" will be updated on a monthly basis. This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (32 data elements) The following apply to all three datasets: Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. Some data cells are suppressed to protect individual privacy. The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured. Datasets are updated monthly. Datasets are created using CDC’s operational Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. For more information about data collection and reporting, please see https://wwwn.cdc.gov/nndss/data-collection.html For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html Overview The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported volun
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the last decade, there have been significant changes in data sharing policies and in the data sharing environment faced by life science researchers. Using data from a 2013 survey of over 1600 life science researchers, we analyze the effects of sharing policies of funding agencies and journals. We also examine the effects of new sharing infrastructure and tools (i.e., third party repositories and online supplements). We find that recently enacted data sharing policies and new sharing infrastructure and tools have had a sizable effect on encouraging data sharing. In particular, third party repositories and online supplements as well as data sharing requirements of funding agencies, particularly the NIH and the National Human Genome Research Institute, were perceived by scientists to have had a large effect on facilitating data sharing. In addition, we found a high degree of compliance with these new policies, although noncompliance resulted in few formal or informal sanctions. Despite the overall effectiveness of data sharing policies, some significant gaps remain: about one third of grant reviewers placed no weight on data sharing plans in their reviews, and a similar percentage ignored the requirements of material transfer agreements. These patterns suggest that although most of these new policies have been effective, there is still room for policy improvement.
The Department of Planning, Lands and Heritage data licensing agreement for the use of digital information acquired from Data WA Show full description
The Terms of Service dataset is a law dataset corresponding to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7
Check out our website at atticusprojectai.org.
Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Existing Contracts Register for awarded contracts over ÂŁ5,000. An extract of all published contract awards starting from 01 April 2017. The data names the buyer and the awarded suppliers, plus information on the value and duration of the contract itself. This is a work in progress - by improving data quality and maintaining compliance with procurement regulations, the Council is working towards a complete dataset. Other details about Contracts shown in this dataset may be available on the ProContract website (source link shown below). That website may for example, also provide information on suppliers for Contracts that have more than one supplier. The data is updated quarterly. Data source: Procurement Lincolnshire, Lincolnshire County Council. For any enquiries about this publication contact procontract.support@lincolnshire.gov.uk
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied.
This dataset reflects the boundaries of those Indigenous Land Use Agreements (ILUA) that have entered the notification process or have been registered and placed on the Register of Indigenous Land Use Agreements (s199A, Native Title Act; Commonwealth). This is a national dataset. Aspatial attribution includes National Native Title Tribunal number, Name, Agreement Type, Proponent, Area and Registration Date.
Lineage:
Created by the National Native Title Tribunal in 1998 and continuously updated and maintained.
Positional accuracy:
0.1 m
Attribute accuracy:
Attributes are maintained continuously and should at all times reflect the primary detail as contained within the Register of ILUA's.
Logical Consistency:
Technical or unintentional overlaps between boundaries may arise within this dataset. Technical overlaps include portions of boundaries of determinations that are intended to abut but which overlap. These overlaps may be caused by changes in source datasets used to create initial application boundaries or by differing interpretations of determination descriptions. Part of the maintenance program of this dataset is the identification and removal of such technical overlaps.
Completeness:
Ongoing
https://data.gov.au/data/dataset/eb8caa51-a883-4e87-907d-fea1a4a054f1
Geoscience Australia (2014) Registered and Notified Indigenous Land Use Agreements (ILUA) - agreement boundaries and core attributes about agreement - 01/03/2014. Bioregional Assessment Source Dataset. Viewed 05 July 2017, http://data.bioregionalassessments.gov.au/dataset/cb9ce027-92e9-4e47-8816-b7f362d86342.
Public contracts with the City of Bloomington since 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An "indigenous land use agreement" (ILUA) is a voluntary, legally binding agreement about the use and management of land or waters, made between one or more native title groups and non-native title interest holders in the ILUA area (such as grantee parties, pastoralists or governments).
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
International agreements in force and applied for the purposes of exemption from customs duties or benefit from a reduced tariff
The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It provides senior management an overall view of existing data sharing agreements; fosters productive sharing of health information with VHA's external partners; and streamlines data acquisition to improve data management responsibilities overall. Agreements that VHA has established with entities within the VA are not candidates for this Repository.