Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NFDI4Health Task Force COVID-19 Metadata Schema (MDS) Mapping to FHIR contains a list of properties describing a resource (study, questionnaire or document) being registered in the Central Search Hub of the NFDI4Health Task Force COVID-19 and their mapping to FHIR.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Weekly archive of some State of Pennsylvania datasets found in this list: https://data.pa.gov/browse?q=vaccinations
For most of these datasets, the "date_saved" field is the date that the WPRDC pulled the data from the state data portal and the archive combines all the saved records into one table. The exception to this is the "COVID-19 Vaccinations by Day by County of Residence Current Health (archive)" which is already published by the state as an entire history.
The "date_updated" field is based on the date that the "updatedAt" field from the corresponding data.pa.gov dataset. Changes to this field have turned out to not be a good indicator of whether records have updated, which is why we are archiving this data on a weekly basis without regard to the "updatedAt" value. The "date_saved" field is the one you should sort on to see the variation in vaccinations over time.
Most of the source tables have gone through schema changes or expansions. In some cases, we've kept the old archives under a separate resource with something like "[Orphaned Schema]" added to the resource name. In other cases, we've adjusted our schema to accommodate new column names, but there will be a date range during which the new columns have null values because we did not start pulling them until we became aware of them.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
A full description of this dataset along with updated information can be found here.
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.
This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.
Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.
Dataset content:
Commercial use subset
Non-commercial use subset
PMC custom license subset
bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)
Metadata file
Readme
Each paper is represented as a single JSON object (see schema file for details).
Description:
The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:
PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)
Additional COVID-19 research articles from a corpus maintained by the WHO
bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)
We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).
We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.
This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.
Citation:
When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:
In bibliography:
COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505
In text:
(CORD-19, 2020)
The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CSCoV (Computational Studies about COVID-19) is a dataset containing filtered computational studies related with COVID-19.
Json schema is shown below:
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Background
The COVID-19 pandemic is a global healthcare emergency. Prediction models for COVID-19 imaging are rapidly being developed to support medical decision making in imaging. However, inadequate availability of a diverse annotated dataset has limited the performance and generalizability of existing models.
Purpose
To create the first multi-institutional, multi-national expert annotated COVID-19 imaging dataset made freely available to the machine learning community as a research and educational resource for COVID-19 chest imaging. The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD) collection of COVID-related imaging datasets and expert annotations to support research and education. RICORD data will be incorporated in the Medical Imaging and Data Resource Center (MIDRC), a multi-institutional research data repository funded by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health.
Materials and Methods
This dataset was a collaboration between the RSNA and Society of Thoracic Radiology (STR).
Results
The RSNA International COVID-19 Open Annotated Radiology Database (RICORD) release 1b consists of 120 thoracic computed tomography (CT) scans of COVID negative patients from four international sites.
Patient Selection: Patients at least 18 years in age receiving negative diagnosis for COVID-19.
Data Abstract
120 de-identified Thoracic CT scans from COVID negative patients.
Supporting clinical variables: MRN*, Age, Exam Date/Time*, Exam Description, Sex, Study UID*, Image Count, Modality, Symptomatic, Testing Result, Specimen Source (* pseudonymous values).
Research Benefits
As this is a public dataset, RICORD is available for non-commercial use (and further enrichment) by the research and education communities which may include development of educational resources for COVID-19, use of RICORD to create AI systems for diagnosis and quantification, benchmarking performance for existing solutions, exploration of distributed/federated learning, further annotation or data augmentation efforts, and evaluation of the examinations for disease entities beyond COVID-19 pneumonia. Deliberate consideration of the detailed annotation schema, demographics, and other included meta-data will be critical when generating cohorts with RICORD, particularly as more public COVID-19 imaging datasets are made available via complementary and parallel efforts. It is important to emphasize that there are limitations to the clinical “ground truth” as the SARS-CoV-2 RT-PCR tests have widely documented limitations and are subject to both false-negative and false-positive results which impact the distribution of the included imaging data, and may have led to an unknown epidemiologic distortion of patients based on the inclusion criteria. These limitations notwithstanding, RICORD has achieved the stated objectives for data complexity, heterogeneity, and high-quality expert annotations as a comprehensive COVID-19 thoracic imaging data resource.
The COVID Symptom Tracker (https://covid.joinzoe.com/) mobile application was designed by doctors and scientists at King's College London, Guys and St Thomas’ Hospitals working in partnership with ZOE Global Ltd – a health science company. This research is led by Dr Tim Spector, professor of genetic epidemiology at King’s College London and director of TwinsUK a scientific study of 15,000 identical and non-identical twins, which has been running for nearly three decades. The dataset schema includes: - Demographic Information (Year of Birth, Gender, Height, Weight, Postcode) - Health Screening Questions (Activity, Heart Disease, Diabetes, Lung Disease, Smoking Status, Kidney Disease, Chemotherapy, Immunosuppressants, Corticosteroids, Blood Pressure Medications, Previous COVID, COVID Symptoms, Needs Help, Housebound Problems, Help Availability, Mobility Aid) - COVID Testing Conducted - How You Feel? - Symptom Description - Location Information (Home, Hospital, Back From Hospital) - Treatment Received
Dataset Access Request: https://healthdatagateway.org/detail/9b604483-9cdc-41b2-b82c-14ee3dd705f6
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
There's a story behind every dataset and here's your opportunity to share yours. There are four knowledge graphs related to SARS-COV-2 virus. 1. virusnetwork.taxonomy, taxonomy of most viruses, data from NCBI and some other databases. 2. virusnetwork.sars-cov-2, fundamental information about SARS-COV-2, data from NCBI and some other databases. 3. virusnetwork.drug, anti-virus drug related KG, data from drugbank and some other databases. 4. phylogeny of COVID-19, data from nextstrain database.
Please read schema definitions in files extracted from schema.zip . Readme document is written in chinese now. We plan to provide an English version in the future.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4820882%2F45a57ce93a1f5cc261b31a559659849a%2F2016-06-01-060520.78739420120321225402110b1332312087121.jpg?generation=1586158900288797&alt=media%20=60x20" alt="Zhejiang University">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4820882%2F070bb42977f3b7f2dfcb376731274695%2F2020-03-10-125137.958002HWPOSRBGVertical-300ppi.jpg?generation=1586158939864852&alt=media%20=60x20" alt="Huawei Cloud">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4820882%2Fd6bf353000ff7391ad3bf3ccd5449e54%2F2016-06-01-062020.396032c1s.png?generation=1586158864797139&alt=media%20=220x50" alt="OpenKG">
The dataset was published originally in OpenKG.cn ( http://openkg.cn/dataset/covid-19-research ). If you want to contact with maintainers, please follow this link and obtain their emails.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Covid19Kerala.info-Data is a consolidated multi-source open dataset of metadata from the COVID-19 outbreak in the Indian state of Kerala. It is created and maintained by volunteers of ‘Collective for Open Data Distribution-Keralam’ (CODD-K), a nonprofit consortium of individuals formed for the distribution and longevity of open-datasets. Covid19Kerala.info-Data covers a set of correlated temporal and spatial metadata of SARS-CoV-2 infections and prevention measures in Kerala. Static releases of this dataset snapshots are manually produced from a live database maintained as a set of publicly accessible Google sheets. This dataset is made available under the Open Data Commons Attribution License v1.0 (ODC-BY 1.0).
Schema and data package Datapackage with schema definition is accessible at https://codd-k.github.io/covid19kerala.info-data/datapackage.json. Provided datapackage and schema are based on Frictionless data Data Package specification.
Temporal and Spatial Coverage
This dataset covers COVID-19 outbreak and related data from the state of Kerala, India, from January 31, 2020 till the date of the publication of this snapshot. The dataset shall be maintained throughout the entirety of the COVID-19 outbreak.
The spatial coverage of the data lies within the geographical boundaries of the Kerala state which includes its 14 administrative subdivisions. The state is further divided into Local Self Governing (LSG) Bodies. Reference to this spatial information is included on appropriate data facets. Available spatial information on regions outside Kerala was mentioned, but it is limited as a reference to the possible origins of the infection clusters or movement of the individuals.
Longevity and Provenance
The dataset snapshot releases are published and maintained in a designated GitHub repository maintained by CODD-K team. Periodic snapshots from the live database will be released at regular intervals. The GitHub commit logs for the repository will be maintained as a record of provenance, and archived repository will be maintained at the end of the project lifecycle for the longevity of the dataset.
Data Stewardship
CODD-K expects all administrators, managers, and users of its datasets to manage, access, and utilize them in a manner that is consistent with the consortium’s need for security and confidentiality and relevant legal frameworks within all geographies, especially Kerala and India. As a responsible steward to maintain and make this dataset accessible— CODD-K absolves from all liabilities of the damages, if any caused by inaccuracies in the dataset.
License
This dataset is made available by the CODD-K consortium under ODC-BY 1.0 license. The Open Data Commons Attribution License (ODC-By) v1.0 ensures that users of this dataset are free to copy, distribute and use the dataset to produce works and even to modify, transform and build upon the database, as long as they attribute the public use of the database or works produced from the same, as mentioned in the citation below.
Disclaimer
Covid19Kerala.info-Data is provided under the ODC-BY 1.0 license as-is. Though every attempt is taken to ensure that the data is error-free and up to date, the CODD-K consortium do not bear any responsibilities for inaccuracies in the dataset or any losses—monetary or otherwise—that users of this dataset may incur.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain a snapshot of the database described in MANUSCRIPT* as of 31-07-2020. The schema.org file contains a plain text description of the tables. The tables themselves are in the CSV files. The covid19db-figshare-master.zip contains the schemas along the scripts that where used to download these data and some example usage. A complete listing of the variables and their sources can be found in MANUSCRIPT. Since the OxCOVID19 Database draws from multiple data sources there are different licenses for different tables, please see https://covid19.eng.ox.ac.uk/ for additional details.* The manuscript is currently under consideration, this description of the data will be updated when there is a publicly available version. In the meantime, please refer to https://covid19.eng.ox.ac.uk/
Daily COVID-19 reports from Johns Hopkins University Center for Systems Science and Engineering. This dataset is generated by calculating differences of each cumulative daily report from the previous day to identify daily changes in the number of confirmed, active, recovered, and fatal cases. This dataset reports from after CSSE changed its daily report schema on March 22.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dataset contains the OpenAIRE COVID-19 Gateway records, identified via full-text mining and inference techniques applied to the OpenAIRE Graph. The OpenAIRE Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA.
The dataset consists of a tar archive containing gzip files with one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.8238913.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe third wave of the global health crisis attributed to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus reached Colombia in March 2021. Over the following 6 months, it was interpolated by manifestations of popular disapproval to the actual political regime—with multiple protests sprouting throughout the country. Large social gatherings seeded novel coronavirus disease 2019 (COVID-19) variants in big cities and propagated their facile spread, leading to increased rates of hospitalizations and deaths.MethodsIn this article, we evaluate the effective reproduction number (Rt) dynamics of SARS-CoV-2 in Cali, Colombia, between 4 April 2021 and 31 July 2021 based on the analysis of 228 genomes.ResultsOur results showed clear contrast in Rt values between the period of frequent protests (Rt > 1), and the preceding and following months (Rt < 1). Genomic analyses revealed 16 circulating SARS-CoV-2 lineages during the initial period—including variants of concern (VOCs) (Alpha, Gamma, and Delta) and variants of interest (VOIs) (Lambda and Mu). Furthermore, we noticed the Mu variant dominating the COVID-19 distribution schema as the months progressed. We identified four principal clusters through phylogenomic analyses—each one of potentially independent introduction to the city. Two of these were associated with the Mu variant, one associated with the Gamma variant, and one with the Lambda variant.ConclusionOur results chronicle the impact of large group assemblies on the epidemiology of COVID-19 during this intersection of political turmoil and sanitary crisis in Cali, Colombia. We emphasize upon the effects of limited biosecurity strategies (which had characterized this time period), on the spread of highly virulent strains throughout Cali and greater Colombia.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dump provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dump contains records of the OpenAIRE COVID-19 Gateway, identified via full-text mining and inference techniques applied to the OpenAIRE Research Graph. The Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA. The dump consists of a tar archive containing gzip files with one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.4723499.
This is the Coronavirus Government Response Tracker from the University of Oxford Blavatnik School of Government. This dataset was created in response to the COVID-19 outbreak to track and compare policy responses from governments around the world. Information about this dataset including data schema and coding is available here . The Blavatnik School of Government has also made available visualizations and analysis to aid in interpreting the data. Usage of this dataset should be cited using the preferred citation below: Recommended citation for data: Hale, Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira (2020). Oxford COVID-19 Government Response Tracker, Blavatnik School of Government. Data use policy: Creative Commons Attribution CC BY standard. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CSCoV (Computational Studies about COVID-19) is a dataset containing COVID-19 related studies extracted from PubMed, bioRxiv, medRxiv, and arXiv, together with article and author related metrics obtained from Semantic Scholar (plus page views from bioRxiv and medRxiv). Using machine learning, the articles are categorized in six topics (Pharmacology, Genomics, Epidemiology, Healthcare, Clinical Medicine, Clinical Imaging) and prioritized. The database is periodically updated.
Source code: https://github.com/SFB-KAUST/covid-review
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains two files: (1) a crosswalk between CESSDA Data Catalogue (CDC) DDI2.5 Metadata Profile (https://cmv.cessda.eu/profiles/cdc/ddi-2.5/1.0.4/profile.html) and ECRIN Metadata Schema for Clinical Research Data Objects Version 6.0 (August 2021) (https://zenodo.org/record/5554961) with an extension to “geographical data” and (2) vice versa.
Daily cases and deaths by date reported to World Health Organization. the schema of the dataset is below: Field name Type Description
Date_reported Date Date of reporting to WHO
Country_code String ISO Alpha-2 country code
Country String Country, territory, area
WHO_region String WHO regional offices: WHO Member States are grouped into six WHO regions -- Regional Office for Africa (AFRO), Regional Office for the Americas (AMRO), Regional Office for South-East Asia (SEARO), Regional Office for Europe (EURO), Regional Office for the Eastern Mediterranean (EMRO), and Regional Office for the Western Pacific (WPRO).
New_cases Integer New confirmed cases. Calculated by subtracting previous cumulative case count from current cumulative cases count.*
Cumulative_cases Integer Cumulative confirmed cases reported to WHO to date.
New_deaths Integer New confirmed deaths. Calculated by subtracting previous cumulative deaths from current cumulative deaths.*
Cumulative_deaths Integer Cumulative confirmed deaths reported to WHO to date.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description: Covid-19 aggregated data sets shall be issued on the basis of data sent to the Health Information System by health service providers. The date of statistics shall be based on the time of first receipt of vaccination data by the health information system and not on the date of vaccination. Depending on the documentation of health service providers and the sending of data, there may be nearly 1 day's reference to the receipt of data. Health service providers have the right and obligation to introduce corrections in the detection of errors which may affect retrospective statistics. The datasets are available in machine-readable JSON format and in CSV format. The metadata shall be published in the JSON Schema format. The datasets are updated 1 time a week, on Tuesdays from 12:00-12:30.
EU eHealthNetwork value sets as referenced by the EU Digital COVID Certificate (DCC) JSON Schema Published by European eHealth network - digital covid certificate coordination on github under Apache v2 license
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the datasets created and extracted for the paper:
Giuseppe Serna García, Ruba Al Khalaf, Francesco Invernici, Stefano Ceri, and Anna Bernasconi. 2022.
"CoVEffect: Interactive System for Mining the Effects of SARS-CoV-2 Mutations and Variants Based on Deep Learning". (Available online at http://gmql.eu/coveffect)
--------------------------------------------------------------------------------
LIST OF FILES WITH DESCRIPTION:
--------------------------------------------------------------------------------
AdditionalFile1-effects-taxonomy:
Descriptions of legal values for the 'Effect' field, based on a categorized taxonomy.
AdditionalFile2-levels-taxonomy:
Descriptions of legal values for the 'Level' field.
AdditionalFile3-training_dataset_target:
List of target tuples (manually annotated) of 221 abstracts considered for training the model. For each abstract, target tuples follow the schema ID, DOI, title, entity, effect, level, type (mutation or variant), tuples_count (>1 when an effect/level is shared by multiple entities, #abstracts containing the same effect described in the tuple).
AdditionalFile4-validation_dataset_target:
List of target tuples (manually annotated) of 50 abstracts considered for validating the prepared prediction model.
For each abstract, target tuples follow the schema defined for AdditionalFile3.
AdditionalFile5-validation_dataset_highlighted:
Textual abstracts of the 50 manuscripts considered for validation; the text used to support the manual target annotations has been highlighted in yellow.
AdditionalFile6-validation_dataset_prediction:
List of predicted annotations of 50 abstracts considered for validating the prepared prediction model. The file is split in 4 TSV, respectively for entity (a), effect (b), level (c), and whole tuple predictions (d).
AdditionalFile7-keywords_query_list:
Keyword-based search run on the CORD-19 dataset to extract a relevant subset of abstracts regarding the scope of interest of CoVEffect. The Boolean logic used to combine keywords is explained in the section 'Annotations of the biology-related CORD-19 cluster'.
AdditionalFile8-CORD-19_batch_dataset_metadata:
Metadata of the 7,230 papers extracted by the keyword-based query in AdditionalFile7.
These abstracts have been annotated by the prediction framework.
AdditionalFile9-CORD-19_batch_dataset_prediction:
List of predicted annotations of 7,230 abstracts extracted from the biology-related cluster of CORD-19.
AdditionalFile10-test_dataset_target:
List of target tuples (manually annotated) of 100 abstracts randomly selected from the 7,230 extracted as in AdditionalFile8.
For each abstract, target tuples follow the schema defined for AdditionalFile3.
AdditionalFile11-test_dataset_prediction:
List of predicted annotations of 100 abstracts considered for testing the prediction model on a subset of the CORD-19 biology-related cluster. As AdditionalFile6, it is split in 4 TSV, respectively for entity (a), effect (b), level (c), and whole tuple predictions (d).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NFDI4Health Task Force COVID-19 Metadata Schema (MDS) Mapping to FHIR contains a list of properties describing a resource (study, questionnaire or document) being registered in the Central Search Hub of the NFDI4Health Task Force COVID-19 and their mapping to FHIR.