Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.
The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.
The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.
Facebook
TwitterThe COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.
Testing is a crucial part of any public health response, and sharing test data is essential to understanding this outbreak. The CDC is currently not publishing complete testing data, so we’re doing our best to collect it from each state and provide it to the public. The information is patchy and inconsistent, so we’re being transparent about what we find and how we handle it—the spreadsheet includes our live comments about changing data and how we’re working with incomplete information.
From here, you can also learn about our methodology, see who makes this, and find out what information states provide and how we handle it.
Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterAs part of an ongoing partnership with the Census Bureau, the National Center for Health Statistics (NCHS) recently added questions to assess the prevalence of post-COVID-19 conditions (long COVID), on the experimental Household Pulse Survey. This 20-minute online survey was designed to complement the ability of the federal statistical system to rapidly respond and provide relevant information about the impact of the coronavirus pandemic in the U.S. Data collection began on April 23, 2020. Beginning in Phase 3.5 (on June 1, 2022), NCHS included questions about the presence of symptoms of COVID that lasted three months or longer. Phase 3.5 will continue with a two-weeks on, two-weeks off collection and dissemination approach. Estimates on this page are derived from the Household Pulse Survey and show the percentage of adults aged 18 and over who a) as a proportion of the U.S. population, the percentage of adults who EVER experienced post-COVID conditions (long COVID). These adults had COVID and had some symptoms that lasted three months or longer; b) as a proportion of adults who said they ever had COVID, the percentage who EVER experienced post-COVID conditions; c) as a proportion of the U.S. population, the percentage of adults who are CURRENTLY experiencing post-COVID conditions. These adults had COVID, had long-term symptoms, and are still experiencing symptoms; d) as a proportion of adults who said they ever had COVID, the percentage who are CURRENTLY experiencing post-COVID conditions; and e) as a proportion of the U.S. population, the percentage of adults who said they ever had COVID.
Facebook
TwitterAs of March 10, 2023, the death rate from COVID-19 in the state of New York was 397 per 100,000 people. New York is one of the states with the highest number of COVID-19 cases.
Facebook
TwitterNote: This dataset is no longer being updated as of June 2, 2025.
This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.
AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.
LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.
The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.
While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.
Several additional data limitations should be kept in mind:
Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.
Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.
However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.
Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.
The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterAs of March 10, 2023, the state with the highest number of COVID-19 cases was California. Almost 104 million cases have been reported across the United States, with the states of California, Texas, and Florida reporting the highest numbers.
From an epidemic to a pandemic The World Health Organization declared the COVID-19 outbreak a pandemic on March 11, 2020. The term pandemic refers to multiple outbreaks of an infectious illness threatening multiple parts of the world at the same time. When the transmission is this widespread, it can no longer be traced back to the country where it originated. The number of COVID-19 cases worldwide has now reached over 669 million.
The symptoms and those who are most at risk Most people who contract the virus will suffer only mild symptoms, such as a cough, a cold, or a high temperature. However, in more severe cases, the infection can cause breathing difficulties and even pneumonia. Those at higher risk include older persons and people with pre-existing medical conditions, including diabetes, heart disease, and lung disease. People aged 85 years and older have accounted for around 27 percent of all COVID-19 deaths in the United States, although this age group makes up just two percent of the U.S. population
Facebook
TwitterNOTE: This dataset has been retired and marked as historical-only. Only Chicago residents are included based on the home ZIP Code as provided by the medical provider. If a ZIP was missing or was not valid, it is displayed as "Unknown". Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the week the test specimen was collected. For privacy reasons, until a ZIP Code reaches five cumulative cases, both the weekly and cumulative case counts will be blank. Therefore, summing the “Cases - Weekly” column is not a reliable way to determine case totals. Deaths are those that have occurred among cases based on the week of death. For tests, each test is counted once, based on the week the test specimen was collected. Tests performed prior to 3/1/2020 are not included. Test counts include multiple tests for the same person (a change made on 10/29/2020). PCR and antigen tests reported to Chicago Department of Public Health (CDPH) through electronic lab reporting are included. Electronic lab reporting has taken time to onboard and testing availability has shifted over time, so these counts are likely an underestimate of community infection. The “Percent Tested Positive” columns are calculated by dividing the number of positive tests by the number of total tests . Because of the data limitations for the Tests columns, such as persons being tested multiple times as a requirement for employment, these percentages may vary in either direction from the actual disease prevalence in the ZIP Code. All data are provisional and subject to change. Information is updated as additional details are received. To compare ZIP Codes to Chicago Community Areas, please see http://data.cmap.illinois.gov/opendata/uploads/CKAN/NONCENSUS/ADMINISTRATIVE_POLITICAL_BOUNDARIES/CCAzip.pdf. Both ZIP Codes and Community Areas are also geographic datasets on this data portal. Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, Illinois Vital Records, American Community Survey (2018)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description Source data: https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.
Only Chicago residents are included based on the home ZIP Code as provided by the medical provider. If a ZIP was missing or was not valid, it is displayed as "Unknown".
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the week the test specimen was collected. For privacy reasons, until a ZIP Code reaches five cumulative cases, both the weekly and cumulative case counts will be blank. Therefore, summing the “Cases - Weekly” column is not a reliable way to determine case totals. Deaths are those that have occurred among cases based on the week of death.
For tests, each test is counted once, based on the week the test specimen was collected. Tests performed prior to 3/1/2020 are not included. Test counts include multiple tests for the same person (a change made on 10/29/2020). PCR and antigen tests reported to Chicago Department of Public Health (CDPH) through electronic lab reporting are included. Electronic lab reporting has taken time to onboard and testing availability has shifted over time, so these counts are likely an underestimate of community infection.
The “Percent Tested Positive” columns are calculated by dividing the number of positive tests by the number of total tests . Because of the data limitations for the Tests columns, such as persons being tested multiple times as a requirement for employment, these percentages may vary in either direction from the actual disease prevalence in the ZIP Code.
All data are provisional and subject to change. Information is updated as additional details are received.
To compare ZIP Codes to Chicago Community Areas, please see http://data.cmap.illinois.gov/opendata/uploads/CKAN/NONCENSUS/ADMINISTRATIVE_POLITICAL_BOUNDARIES/CCAzip.pdf. Both ZIP Codes and Community Areas are also geographic datasets on this data portal.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, Illinois Vital Records, American Community Survey (2018)
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
After October 13, 2022, this dataset will no longer be updated as the related CDC COVID Data Tracker site was retired on October 13, 2022.
This dataset contains historical trends in vaccinations and cases by age group, at the US national level. Data is stratified by at least one dose and fully vaccinated. Data also represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.
Facebook
TwitterThis public use dataset has 11 data elements reflecting COVID-19 community levels for all available counties. This dataset contains the same values used to display information available at https://www.cdc.gov/coronavirus/2019-ncov/science/community-levels-county-map.html. CDC looks at the combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days — to determine the COVID-19 community level. The COVID-19 community level is determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge. Using these data, the COVID-19 community level is classified as low, medium , or high. COVID-19 Community Levels can help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals. See https://www.cdc.gov/coronavirus/2019-ncov/science/community-levels.html for more information. Visit CDC’s COVID Data Tracker County View* to learn more about the individual metrics used for CDC’s COVID-19 community level in your county. Please note that county-level data are not available for territories. Go to https://covid.cdc.gov/covid-data-tracker/#county-view. For the most accurate and up-to-date data for any county or state, visit the relevant health department website. *COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset: WHO COVID-19 Global Data
This dataset provides comprehensive information on the global COVID-19 pandemic as reported to the World Health Organization (WHO). The dataset is available in comma-separated values (CSV) format and includes the following fields:
Daily cases and deaths by date reported to WHO: WHO-COVID-19-global-data.csv
In addition to the COVID-19 case and death data, this dataset also includes valuable information related to COVID-19 vaccinations. The vaccination data consists of the following fields:
Vaccination Data Fields: vaccination-data.csv
In addition to the vaccination data, a separate dataset containing vaccination metadata is available, including information about vaccine names, product names, company names, authorization dates, start and end dates of vaccine rollout, and more.
Vaccination metadata Fields: vaccination-metadata.csv
Facebook
TwitterThe COVID-19 dashboard includes data on city/town COVID-19 activity, confirmed and probable cases of COVID-19, confirmed and probable deaths related to COVID-19, and the demographic characteristics of cases and deaths.
Facebook
TwitterAs of June 13, 2023, there have been almost 768 million cases of coronavirus (COVID-19) worldwide. The disease has impacted almost every country and territory in the world, with the United States confirming around 16 percent of all global cases.
COVID-19: An unprecedented crisis Health systems around the world were initially overwhelmed by the number of coronavirus cases, and even the richest and most prepared countries struggled. In the most vulnerable countries, millions of people lacked access to critical life-saving supplies, such as test kits, face masks, and respirators. However, several vaccines have been approved for use, and more than 13 billion vaccine doses had already been administered worldwide as of March 2023.
The coronavirus in the United Kingdom Over 202 thousand people have died from COVID-19 in the UK, which is the highest number in Europe. The tireless work of the National Health Service (NHS) has been applauded, but the country’s response to the crisis has drawn criticism. The UK was slow to start widespread testing, and the launch of a COVID-19 contact tracing app was delayed by months. However, the UK’s rapid vaccine rollout has been a success story, and around 53.7 million people had received at least one vaccine dose as of July 13, 2022.
Facebook
TwitterAs of April 26, 2023, among adults 18-29 years, the total number of cases of COVID-19 has reached almost 19.48million. This statistic illustrates the total number of cases of COVID-19 in the United States as of April 26, 2023, by age group.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - WHO
People can catch COVID-19 from others who have the virus. This has been spreading rapidly around the world and Italy is one of the most affected country.
On March 8, 2020 - Italy’s prime minister announced a sweeping coronavirus quarantine early Sunday, restricting the movements of about a quarter of the country’s population in a bid to limit contagions at the epicenter of Europe’s outbreak. - TIME
This dataset is from https://github.com/pcm-dpc/COVID-19 collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale
This dataset has two files
covid19_italy_province.csv - Province level data of COVID-19 casescovid_italy_region.csv - Region level data of COVID-19 casesData is collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale and is uploaded into this github repo.
Dashboard on the data can be seen here. Picture courtesy is from the dashboard.
Insights on * Spread to various regions over time * Try to predict the spread of COVID-19 ahead of time to take preventive measures
Facebook
TwitterNote: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and aut
Facebook
TwitterCollection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset represents preliminary estimates of cumulative U.S. COVID-19 disease burden for the 2024-2025 period, including illnesses, outpatient visits, hospitalizations, and deaths. The weekly COVID-19-associated burden estimates are preliminary and based on continuously collected surveillance data from patients hospitalized with laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. The data come from the Coronavirus Disease 2019 (COVID-19)-Associated Hospitalization Surveillance Network (COVID-NET), a surveillance platform that captures data from hospitals that serve about 10% of the U.S. population. Each week CDC estimates a range (i.e., lower estimate and an upper estimate) of COVID-19 -associated burden that have occurred since October 1, 2024.
Note: Data are preliminary and subject to change as more data become available. Rates for recent COVID-19-associated hospital admissions are subject to reporting delays; as new data are received each week, previous rates are updated accordingly.
References
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.
The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.
The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.