The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It provides senior management an overall view of existing data sharing agreements; fosters productive sharing of health information with VHA's external partners; and streamlines data acquisition to improve data management responsibilities overall. Agreements that VHA has established with entities within the VA are not candidates for this Repository.
This dataset contains all of the current parcels that are currently under an Open Space Use Agreement between the owners of the parcel and the County of Albemarle. These agreements limit construction and development activity on the property owner's land, and lasts from 4 to 10 years. For more information on any particular agreement, contact the Real Estate division of the County of Albemarle's Finance Department.
Beginning March 1, 2022, the "COVID-19 Case Surveillance Public Use Data" will be updated on a monthly basis. This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (32 data elements) The following apply to all three datasets: Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. Some data cells are suppressed to protect individual privacy. The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured. Datasets are updated monthly. Datasets are created using CDC’s operational Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. For more information about data collection and reporting, please see https://wwwn.cdc.gov/nndss/data-collection.html For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html Overview The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported volun
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please download the full version of the dataset from Zenodo, here.
Contract Understanding Atticus Dataset (CUAD) v1 is a corpus of more than 13,000 labels in 510 commercial legal contracts that have been manually labeled by The Atticus Project to identify 41 categories of important clauses that lawyers look for when reviewing contracts.
We tested CUAD v1 against ten pretrained AI models and published the results on arXiv here.
Code for replicating the results, together with the model trained on CUAD, is published on Github here.
This dataset collects the slides that were presented at the Data Collaborations Across Boundaries session in SciDataCon 2022, part of the International Data Week.
The following session proposal was prepared by Tyng-Ruey Chuang and submitted to SciDataCon 2022 organizers for consideration on 2022-02-28. The proposal was accepted on 2022-03-28. Six abstracts were submitted and accepted to this session. Five presentations were delivered online in a virtual session on 2022-06-21.
Data Collaborations Across Boundaries
There are many good stories about data collaborations across boundaries. We need more. We also need to share the lessons each of us has learned from collaborating with parties and communities not in our familiar circles.
By boundaries, we mean not just the regulatory borders in between the nation states about data sharing but the various barriers, readily conceivable or not, that hinder collaboration in aggregating, sharing, and reusing data for social good. These barriers to collaboration exist between the academic disciplines, between the economic players, and between the many user communities, just to name a few. There are also cross-domain barriers, for example those that lay among data practitioners, public administrators, and policy makers when they are articulating the why, what, and how of "open data" and debating its economic significance and fair distribution. This session aims to bring together experiences and thoughts on good data practices in facilitating collaborations across boundaries and domains.
The success of Wikipedia proves that collaborative content production and service, by ways of copyleft licenses, can be sustainable when coordinated by a non-profit and funded by the general public. Collaborative code repositories like GitHub and GitLab demonstrate the enormous value and mass scale of systems-facilitated integration of user contributions that run across multiple programming languages and developer communities. Research data aggregators and repositories such as GBIF, GISAID, and Zenodo have served numerous researchers across academic disciplines. Citizen science projects and platforms, for instance eBird, Galaxy Zoo, and Taiwan Roadkill Observation Network (TaiRON), not only collect data from diverse communities but also manage and release datasets for research use and public benefit (e.g. TaiRON datasets being used to improve road design and reduce animal mortality). At the same time large scale data collaborations depend on standards, protocols, and tools for building registries (e.g. Archival Resource Key), ontologies (e.g. Wikidata and schema.org), repositories (e.g. CKAN and Omeka), and computing services (e.g. Jupyter Notebook). There are many types of data collaborations. The above lists only a few.
This session proposal calls for contributions to bring forward lessons learned from collaborative data projects and platforms, especially about those that involve multiple communities and/or across organizational boundaries. Presentations focusing on the following (non-exclusive) topics are sought after:
Support mechanisms and governance structures for data collaborations across organizations/communities.
Data policies --- such as data sharing agreements, memorandum of understanding, terms of use, privacy policies, etc. --- for facilitating collaborations across organizations/communities.
Traditional and non-traditional funding sources for data collaborations across multiple parties; sustainability of data collaboration projects, platforms, and communities.
Data workflows --- collection, processing, aggregation, archiving, and publishing, etc. --- designed with considerations of (external) collaboration.
Collaborative web platforms for data acquisition, curation, analysis, visualization, and education.
Examples and insights from data trusts, data coops, as well as other formal and informal forms of data stewardship.
Debates on the pros and cons of centralized, distributed, and/or federated data services.
Practical lessons learned from data collaboration stories: failure, success, incidence, unexpected turn of event, aftermath, etc. (no story is too small!).
The Procurement Agreements dataset provides details about contract agreements between the City of Detroit and suppliers who provide materials, equipment and services to the City. Initial and amended contracts and purchase orders associated with the contracts are included in the dataset, In some cases, purchase orders are generated to pay suppliers for work completed under a contract. If available, a link to the contract agreement document in PDF format is provided in the 'Contract Link' field of each record (row) in the dataset. This dataset is updated weekly with data from the Office of Contracting and Procurement (OCP).
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance publicly available dataset has 33 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. This dataset requires a registration process and a data use agreement.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affili
KL3M Data Project
Note: This page provides general information about the KL3M Data Project. Additional details specific to this dataset will be added in future updates. For complete information, please visit the GitHub repository or refer to the KL3M Data Project paper.
Description
This dataset is part of the ALEA Institute's KL3M Data Project, which provides copyright-clean training resources for large language models.
Dataset Details
Format: Parquet… See the full description on the dataset page: https://huggingface.co/datasets/alea-institute/kl3m-data-edgar-agreements.
The National Survey of Child and Adolescent Well-Being (NSCAW) is a nationally representative, longitudinal survey of children and families who have been the subjects of investigation by Child Protective Services. There are currently two cohorts of available data (NSCAW I and NSCAW II) drawn from first-hand reports from children, parents, and other caregivers, as well as reports from caseworkers, teachers, and data from administrative records. NSCAW examines child and family well-being outcomes in detail and seeks to relate those outcomes to experience with the child welfare system and to family characteristics, community environment, and other factors. Units of Response: Children and Families in the Child Welfare System Type of Data: Survey Tribal Data: Unavailable Periodicity: Irregular Demographic Indicators: Disability;Ethnicity;Geographic Areas;Household Income;Household Size;Race SORN: Not Applicable Data Use Agreement: https://www.ndacan.acf.hhs.gov/datasets/order_forms/termsofuseagreement.pdf Data Use Agreement Location: https://www.ndacan.acf.hhs.gov/datasets/pdfs_user_guides/IntroNSCAWWave1.pdf Granularity: Individual Spatial: United States Geocoding: Unavailable
This case surveillance publicly available dataset has 32 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. This dataset requires a registration process and a data use agreement. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Data: Restricted access, patient-level dataset with clinical (including symptoms), demographics, and county and state of residence. Access requires a registration process and a data use agreement. (32 data elements) Requesting Access to the COVID-19 Case Surveillance Restricted Access Detailed Data Please review the following documents to determine your interest in accessing the COVID-19 Case Surveillance Restricted Access Detailed Data file: 1) CDC COVID-19 Case Surveillance Restricted Access Detailed Data: Summary, Guidance, Limitations Information, and Restricted Access Data Use Agreement Information 2) Data Dictionary for the COVID-19 Case Surveillance Restricted Access Detailed Data The next step is to complete the Registration Information and Data Use Restrictions Agreement (RIDURA). Once complete, CDC will review your agreement. After access is granted, Ask SRRG (eocevent394@cdc.gov) will email you information about how to access the data through GitHub. If you have questions about obtaining access, email eocevent394@cdc.gov. Overview The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City, the District of Columbia, as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and are shared voluntarily with CDC. For more information, visit: <a href="https://wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-c
The Department of Planning, Lands and Heritage data licensing agreement for the use of digital information acquired from Data WA Show full description
Please refer to Yelp for the original JSON file and other datasets. This dataset was created in June 2020 by Yelp. The usage of this dataset should be for academic purposes.
I read the JSON file in Python and convert it to three CSV files:
Please read Dataset_User_Agreement.pdf before you proceed with all data files.
It would be interesting to see how virtual services were offered by restaurants during COVID in 2020 and how restaurant businesses strived to communicate and connect with customers on Yelp. There is no numeric data to play with, however, it's still valuable to do some visualizations.
This is a graphical polygon dataset depicting the location of the City’s Interlocal agreements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review. If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7
Check out our website at atticusprojectai.org.
Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Existing Contracts Register for awarded contracts over £5,000. An extract of all published contract awards starting from 01 April 2017. The data names the buyer and the awarded suppliers, plus information on the value and duration of the contract itself. This is a work in progress - by improving data quality and maintaining compliance with procurement regulations, the Council is working towards a complete dataset. Other details about Contracts shown in this dataset may be available on the ProContract website (source link shown below). That website may for example, also provide information on suppliers for Contracts that have more than one supplier. The data is updated quarterly. Data source: Procurement Lincolnshire, Lincolnshire County Council. For any enquiries about this publication contact procontract.support@lincolnshire.gov.uk
States report information from two reporting populations: (1) The Served Population which is information on all youth receiving at least one independent living services paid or provided by the Chafee Program agency, and (2) Youth completing the NYTD Survey. States survey youth regarding six outcomes: financial self-sufficiency, experience with homelessness, educational attainment, positive connections with adults, high-risk behaviors, and access to health insurance. States collect outcomes information by conducting a survey of youth in foster care on or around their 17th birthday, also referred to as the baseline population. States will track these youth as they age and conduct a new outcome survey on or around the youth's 19th birthday; and again on or around the youth's 21st birthday, also referred to as the follow-up population. States will collect outcomes information on these older youth at ages 19 or 21 regardless of their foster care status or whether they are still receiving independent living services from the State. Depending on the size of the State's foster care youth population, some States may conduct a random sample of the baseline population of the 17-year-olds that participate in the outcomes survey so that they can follow a smaller group of youth as they age. All States will collect and report outcome information on a new baseline population cohort every three years. Units of Response: Current and former youth in foster care Type of Data: Administrative Tribal Data: No Periodicity: Annual Demographic Indicators: Ethnicity;Race;Sex SORN: Not Applicable Data Use Agreement: https://www.ndacan.acf.hhs.gov/datasets/request-dataset.cfm Data Use Agreement Location: https://www.ndacan.acf.hhs.gov/datasets/order_forms/termsofuseagreement.pdf Granularity: Individual Spatial: United States Geocoding: FIPS Code
The Department of Licensing (DOL) shares data under the strict terms of a data sharing agreement. People and organizations agree to undergo regular data security and permissible use audits. This dataset is a record of the audits that DOL conducts each year.
The Terms of Service dataset is a law dataset corresponding to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.
The Physician and Physician Practice Research Database (3P-RD) captures characteristics of physicians and physician practices in 13 states. The database describes the supply of physician services available across selected states for data year 2019-2020. AHRQ created 3P-RD as a resource to address existing data gaps in physician health services information at the state and market levels. 3P-RD consists of both public use and restricted use data files. The public use file (PUF) version of 3P-RD is currently available for download. The Restricted Use File (RUF) version of 3P-RD will be available for each state. Once the data are released, a data use agreement (DUA) will be required for access to the data files.
The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It provides senior management an overall view of existing data sharing agreements; fosters productive sharing of health information with VHA's external partners; and streamlines data acquisition to improve data management responsibilities overall. Agreements that VHA has established with entities within the VA are not candidates for this Repository.