100+ datasets found

COVID-19 Dataset
kaggle.com
zip
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meir Nizri (2022). COVID-19 Dataset [Dataset]. https://www.kaggle.com/datasets/meirnizri/covid19-dataset
Explore at:
zip(4890659 bytes)Available download formats
Dataset updated
Nov 13, 2022
Authors
Meir Nizri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.

The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.

content

The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.

sex: 1 for female and 2 for male.

age: of the patient.

classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.

patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.

pneumonia: whether the patient already have air sacs inflammation or not.

pregnancy: whether the patient is pregnant or not.

diabetes: whether the patient has diabetes or not.

copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.

asthma: whether the patient has asthma or not.

inmsupr: whether the patient is immunosuppressed or not.

hypertension: whether the patient has hypertension or not.

cardiovascular: whether the patient has heart or blood vessels related disease.

renal chronic: whether the patient has chronic renal disease or not.

other disease: whether the patient has other disease or not.

obesity: whether the patient is obese or not.

tobacco: whether the patient is a tobacco user.

usmr: Indicates whether the patient treated medical units of the first, second or third level.

medical unit: type of institution of the National Health System that provided the care.

intubed: whether the patient was connected to the ventilator.

icu: Indicates whether the patient had been admitted to an Intensive Care Unit.

date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.
COVID-19 dataset
kaggle.com
Updated Mar 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
george saavedra (2022). COVID-19 dataset [Dataset]. https://www.kaggle.com/datasets/georgesaavedra/covid19-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
george saavedra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content:

"Our World in Data" which in collaboration with The University of Oxford have developed a reliable repository of datasets about dozens of topics focusing on those big problems which affect the world. This is why since the beginning of COVID-19 outbreak several researchers have been collecting data from every country in the world about multiple indicators which can make us take better decisions, what is more amazing is the fact that this dataset offered is updated every day for all countries allowing people to keep track of it. In the following link you can find fascinating charts about the pandemic and obviously the World COVID-19 dataset (up to date) containing over 60 features which you can download for free:

https://ourworldindata.org/covid-vaccinations

Important to consider:

I will be updating this dataset every week according to the published data by the organization, if you found this dataset or the link given useful I would really appreciate your upvote!

Acknowledgements and Citation

Mathieu, E., Ritchie, H., Ortiz-Ospina, E. et al. A global database of COVID-19 vaccinations. Nat Hum Behav (2021)
Covid-19 Global Dataset
kaggle.com
zip
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khushi Yadav (2025). Covid-19 Global Dataset [Dataset]. https://www.kaggle.com/datasets/khushikyad001/covid-19-global-dataset
Explore at:
zip(482555 bytes)Available download formats
Dataset updated
Apr 12, 2025
Authors
Khushi Yadav
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains 3,000 rows and 26 columns of synthetically generated COVID-19 records. It replicates realistic global pandemic data, simulating values for cases, deaths, tests, vaccinations, demographics, and policy measures. The data mimics actual records from sources like Our World in Data, designed specifically for data science experimentation, visualization, and machine learning projects.

It is ideal for:

Practicing exploratory data analysis (EDA)

Creating dashboards

Building predictive models

Teaching or student projects

Kaggle Notebooks without API dependencies
i
Coronavirus (COVID-19) Tweets Dataset
ieee-dataport.org
search.datacite.org
+1more
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabindra Lamsal (2025). Coronavirus (COVID-19) Tweets Dataset [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset
Explore at:
Dataset updated
May 7, 2025
Authors
Rabindra Lamsal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
2020
COVID-19 Chest X-Ray Image Repository
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arman Haghanifar; Mahdiyar Molahasani Majdabadi; Seokbum Ko (2023). COVID-19 Chest X-Ray Image Repository [Dataset]. http://doi.org/10.6084/m9.figshare.12580328.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12580328.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Arman Haghanifar; Mahdiyar Molahasani Majdabadi; Seokbum Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of publicly available images from COVID-19 positive patients collected from several sources over the net. All images are chest x-rays from frontal view (AP or PA). There is a ZIP file containing 900 images and a metadata in CSV format which includes information about 452 images.Note that some of the images are from pediatrics and/or from early-stage patients with no specific image findings noted by the radiologist; but all of them are from COVID-positive cases. Related guideline and details are available in the GitHub repo.
B
COVID-19 Twitter Dataset
borealisdata.ca
figshare.com
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/PXF2CU
Dataset updated
Nov 10, 2020
Dataset provided by
Borealis
Authors
Anatoliy Gruzd; Philip Mai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs
COVID-19 Outbreak Data (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Outbreak Data (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-outbreak-data
Explore at:
zip, csv(62919), csv(326192)Available download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This dataset is no longer being updated as of June 2, 2025.

This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.

AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.

LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.

The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.

While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.

Several additional data limitations should be kept in mind:

Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.

Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.

However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.

Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.

The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
Data from: COVID-19 Case Surveillance Public Use Data with Geography
catalog.data.gov
data.virginia.gov
+5more
Updated May 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2021). COVID-19 Case Surveillance Public Use Data with Geography [Dataset]. https://catalog.data.gov/dataset/covid-19-case-surveillance-public-use-data-with-geography-0605b
Explore at:
Dataset updated
May 8, 2021
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 32 data element restricted access dataset. The following apply to the public use datasets and the restricted access dataset: - Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. - Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. - Some data are suppressed to protect individual privacy. - Datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensure that time-dependent outcome data are accurately captured. - Datasets are updated monthly. - Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. - For more information about data collection and reporting, please see wwwn.cdc.gov/nndss/data-collection.html. - For more information about the COVID-19 case surveillance data, please see www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html. Overview The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as "immediately notifiable, urgent (within 24 hours)" by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data collected by jurisdictions are shared voluntarily with CDC. For more information, visit: wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-covid-19/case-definition/2020/08/05/. COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by pu
i
COVID-19 dataset 3 classes
ieee-dataport.org
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaishnavi Jamdade (2020). COVID-19 dataset 3 classes [Dataset]. https://ieee-dataport.org/documents/covid-19-dataset-3-classes
Explore at:
Dataset updated
Jul 1, 2020
Authors
Vaishnavi Jamdade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The rapid outbreak of COVID-19 due to the novel coronavirus SARS-COV-2 is the biggest issue faced by mankind today. It is important to detect the positive cases as early as possible to prevent the further spread of this pandemic.
n
COVID-19 Open Research Dataset
neuinfo.org
scicrunch.org
+2more
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). COVID-19 Open Research Dataset [Dataset]. http://identifiers.org/RRID:SCR_018336
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_018336
Dataset updated
Aug 11, 2024
Description
Collection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.
Novel Covid-19 Dataset
kaggle.com
Updated Sep 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GHOST5612
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context:

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited:

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Here’s a polished version suitable for a professional Kaggle dataset description:

Dataset Description

This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

Files and Columns

1. covid_19_data.csv (Main File)

This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

Sno – Serial number of the record

ObservationDate – Date of the observation (MM/DD/YYYY)

Province/State – Province or state of the observation (may be missing for some entries)

Country/Region – Country of the observation

Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)

Confirmed – Cumulative number of confirmed cases on that date

Deaths – Cumulative number of deaths on that date

Recovered – Cumulative number of recoveries on that date

2. 2019_ncov_data.csv (Legacy File)

This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

3. COVID_open_line_list_data.csv

This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

4. COVID19_line_list_data.csv

Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

✅ Use covid_19_data.csv for up-to-date aggregated global trends.

✅ Use the line list datasets for detailed, individual-level case analysis.

Country level datasets:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:

India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

Acknowledgements :

Johns Hopkins University for making the data available for educational and academic research purposes

MoBS lab - https://www.mobs-lab.org/2019ncov.html

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google....
d
COVID-19 Cases and Deaths by Age Group - ARCHIVE
catalog.data.gov
data.ct.gov
+1more
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Age Group - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-age-group
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken out by age group. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes. Starting in July 2020, this dataset will be updated every weekday. Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020. A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports. Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
p
COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19...
physionet.org
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Álvaro Ritoré; Andreea M Oprescu; Alberto Estirado Bronchalo; Miguel Ángel Armengol de la Hoz (2024). COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19 dataset from HM Hospitales [Dataset]. http://doi.org/10.13026/1176-6c44
Explore at:
Unique identifier
https://doi.org/10.13026/1176-6c44
Dataset updated
Oct 25, 2024
Authors
Álvaro Ritoré; Andreea M Oprescu; Alberto Estirado Bronchalo; Miguel Ángel Armengol de la Hoz
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
COVID Data for Shared Learning (CDSL) is a multimodal database comprising de-identified medical data from 4,479 patients who were hospitalized with confirmed or suspected COVID-19 in the Spanish 'HM Hospitales' group from 2019-12-26 to 2021-02-13. The database provides tabular demographic, diagnostic, clinical and treatment information, as well as radiological images in JPG format, namely chest X-ray and computed tomography scans. The primary goal of CDSL is to develop a comprehensive toolkit to support researchers and institutions in building multimodal models for prediction, classification and optimization. CDSL database was promptly shared with the international research community at the onset of the COVID-19 pandemic to promote worldwide collaboration and, ultimately, guide policy decisions and facilitate effective response efforts studies on the disease.
d
CMS COVID-19 Nursing Home Dataset
catalog.data.gov
data.ct.gov
+2more
Updated Sep 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2025). CMS COVID-19 Nursing Home Dataset [Dataset]. https://catalog.data.gov/dataset/cms-covid-19-nursing-home-dataset
Explore at:
Dataset updated
Sep 27, 2025
Dataset provided by
data.ct.gov
Description
The Nursing Home COVID-19 Public File from the Centers for Medicare & Medicaid Services, filtered for Connecticut. View the full dataset and detailed metadata here. The Nursing Home COVID-19 Public File includes data reported by nursing homes to the CDC’s National Healthcare Safety Network (NHSN) system COVID-19 Long Term Care Facility Module, including Resident Impact, Facility Capacity, Staff & Personnel, and Supplies & Personal Protective Equipment, and Ventilator Capacity and Supplies Data Elements.
COVID-19 Hospital Data (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+4more
csv, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Hospital Data (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-hospital-data
Explore at:
csv(3296422), zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset is not being updated as hospitals are no longer mandated to report COVID Hospitalizations to CDPH.

Data is from the California COVID-19 State Dashboard at https://covid19.ca.gov/state-dashboard/

Note: Hospitalization counts include all patients diagnosed with COVID-19 during their stay. This does not necessarily mean they were hospitalized because of COVID-19 complications or that they experienced COVID-19 symptoms.

Note: Cumulative totals are not available due to the fact that hospitals report the total number of patients each day (as opposed to new patients).
w
COVID-19 Open Research Dataset
datacatalog.library.wayne.edu
Updated Mar 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allen Institute for Artificial Intelligence (2020). COVID-19 Open Research Dataset [Dataset]. https://datacatalog.library.wayne.edu/dataset/covid-19-open-research-dataset
Explore at:
Dataset updated
Mar 31, 2020
Dataset provided by
Allen Institute for Artificial Intelligence
Description
The COVID-19 Open Research Dataset is an extensive machine-readable resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.

The dataset is updated weekly and contains all COVID-19 and coronavirus-related research (e.g., SARS, MERS) from the following sources: PubMed's PMC open access corpus (using this query: COVID-19 and coronavirus research), additional COVID-19 research articles from a corpus maintained by the World Health Organization (WHO), and bioRxiv and medRxiv pre-prints (using this query: COVID-19 and coronavirus research). Also available is a comprehensive metadata file of 44,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications (includes articles without open access full text).
COVID-19 Case Surveillance Restricted Access Detailed Data
catalog.data.gov
s.cnmilf.com
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2021). COVID-19 Case Surveillance Restricted Access Detailed Data [Dataset]. https://catalog.data.gov/dataset/covid-19-case-surveillance-restricted-access-detailed-data-63ce4
Explore at:
Dataset updated
May 2, 2021
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
This case surveillance publicly available dataset has 32 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. This dataset requires a registration process and a data use agreement. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Data: Restricted access, patient-level dataset with clinical (including symptoms), demographics, and county and state of residence. Access requires a registration process and a data use agreement. (32 data elements) Requesting Access to the COVID-19 Case Surveillance Restricted Access Detailed Data Please review the following documents to determine your interest in accessing the COVID-19 Case Surveillance Restricted Access Detailed Data file: 1) CDC COVID-19 Case Surveillance Restricted Access Detailed Data: Summary, Guidance, Limitations Information, and Restricted Access Data Use Agreement Information 2) Data Dictionary for the COVID-19 Case Surveillance Restricted Access Detailed Data The next step is to complete the Registration Information and Data Use Restrictions Agreement (RIDURA). Once complete, CDC will review your agreement. After access is granted, Ask SRRG (eocevent394@cdc.gov) will email you information about how to access the data through GitHub. If you have questions about obtaining access, email eocevent394@cdc.gov. Overview The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City, the District of Columbia, as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and are shared voluntarily with CDC. For more information, visit: <a href="https://wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-c
m
Extensive COVID-19 X-Ray and CT Chest Images Dataset
data.mendeley.com
Updated Jun 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walid El-Shafai (2020). Extensive COVID-19 X-Ray and CT Chest Images Dataset [Dataset]. http://doi.org/10.17632/8h65ywd2jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/8h65ywd2jr.3
Dataset updated
Jun 12, 2020
Authors
Walid El-Shafai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This COVID-19 dataset consists of Non-COVID and COVID cases of both X-ray and CT images. The associated dataset is augmented with different augmentation techniques to generate about 17099 X-ray and CT images. The dataset contains two main folders, one for the X-ray images, which includes two separate sub-folders of 5500 Non-COVID images and 4044 COVID images. The other folder contains the CT images. It includes two separate sub-folders of 2628 Non-COVID images and 5427 COVID images.
m
Data from: COVID-19 Datasets for predicting the number of new cases of...
data.mendeley.com
narcis.nl
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pınar Tüfekci (2020). COVID-19 Datasets for predicting the number of new cases of COVID-19 ahead of 1 day, 3 days, and 10 days [Dataset]. http://doi.org/10.17632/499vtcykvw.1
Explore at:
Unique identifier
https://doi.org/10.17632/499vtcykvw.1
Dataset updated
Jul 28, 2020
Authors
Pınar Tüfekci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Four datasets are presented here. The original dataset is a collection of the COVID-19 data maintained by Our World in Data. It includes data on confirmed cases, and deaths, as well as other variables of potential interest for ten countries such as Australia, Brazil, Canada, China, Denmark, France, Israel, Italy, the United Kingdom, and the United States. The original dataset includes the data from the date of 31st December in 2019 to 31st May in 2020 with a total of 1.530 instances and 19 features. This dataset is collected from a variety of sources (the European Centre for Disease Prevention and Control, United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). After the original dataset is pre-processed by cleaning and removing some data including unnecessary and blank. Then, all strings are converted numeric values, and some new features such as continent, hemisphere, year, month, and day are added by extracting the original features. After that, the processed original dataset is organized for prediction of the number of new cases of COVID-19 for 1 day, 3 days, and 10 days ago and three datasets (Dataset-1, 2, 3) are created for that.
COVID-19 Variant Data (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+4more
csv, xlsx, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Variant Data (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-variant-data
Explore at:
zip, xlsx(6407), csv(373120)Available download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

The California Department of Public Health (CDPH) is identifying the prevalence of circulating SARS-CoV-2 variants by analyzing CDPH Genomic Surveillance Data and CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories statewide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all viruses from positive COVID-19 tests are sequenced. Knowing what variants are circulating in California informs public health and clinical action.

Note: There is a natural reporting lag in these data due to the time commitment to complete whole genome sequencing; therefore, a 14 day lag is applied to these datasets to allow for data completeness. Please note that more recent data should be used with caution.

For more information, please see: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx

Facebook

Twitter

Click to copy link

Link copied

Cite

Meir Nizri (2022). COVID-19 Dataset [Dataset]. https://www.kaggle.com/datasets/meirnizri/covid19-dataset

COVID-19 Dataset

COVID-19 patient's symptoms, status, and medical history.

Explore at:

28 scholarly articles cite this dataset (View in Google Scholar)

zip(4890659 bytes)Available download formats

Dataset updated

Nov 13, 2022

Authors

Meir Nizri

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.

The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.

content

The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.

sex: 1 for female and 2 for male.
age: of the patient.
classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.
patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.
pneumonia: whether the patient already have air sacs inflammation or not.
pregnancy: whether the patient is pregnant or not.
diabetes: whether the patient has diabetes or not.
copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.
asthma: whether the patient has asthma or not.
inmsupr: whether the patient is immunosuppressed or not.
hypertension: whether the patient has hypertension or not.
cardiovascular: whether the patient has heart or blood vessels related disease.
renal chronic: whether the patient has chronic renal disease or not.
other disease: whether the patient has other disease or not.
obesity: whether the patient is obese or not.
tobacco: whether the patient is a tobacco user.
usmr: Indicates whether the patient treated medical units of the first, second or third level.
medical unit: type of institution of the National Health System that provided the care.
intubed: whether the patient was connected to the ventilator.
icu: Indicates whether the patient had been admitted to an Intensive Care Unit.
date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.

Clear search

Close search

Google apps

Main menu

COVID-19 Dataset

Context

content

COVID-19 dataset

Content:

Important to consider:

Acknowledgements and Citation

Covid-19 Global Dataset

Coronavirus (COVID-19) Tweets Dataset

COVID-19 Chest X-Ray Image Repository

COVID-19 Twitter Dataset

COVID-19 Outbreak Data (ARCHIVED)

Data from: COVID-19 Case Surveillance Public Use Data with Geography

COVID-19 dataset 3 classes

COVID-19 Open Research Dataset

Novel Covid-19 Dataset

Context:

Edited:

Content

Dataset Description

Files and Columns

1. covid_19_data.csv (Main File)

2. 2019_ncov_data.csv (Legacy File)

3. COVID_open_line_list_data.csv

4. COVID19_line_list_data.csv

Country level datasets:

Acknowledgements :

COVID-19 Cases and Deaths by Age Group - ARCHIVE

COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19...

CMS COVID-19 Nursing Home Dataset

COVID-19 Hospital Data (ARCHIVED)

COVID-19 Open Research Dataset

COVID-19 Case Surveillance Restricted Access Detailed Data

Extensive COVID-19 X-Ray and CT Chest Images Dataset

Data from: COVID-19 Datasets for predicting the number of new cases of...

COVID-19 Variant Data (ARCHIVED)

COVID-19 Dataset

COVID-19 patient's symptoms, status, and medical history.

Context

content