Facebook
Twitterhttps://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterReporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (to
Facebook
TwitterNote: This dataset is no longer being updated as of June 2, 2025.
This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.
AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.
LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.
The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.
While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.
Several additional data limitations should be kept in mind:
Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.
Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.
However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.
Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.
The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
Facebook
TwitterThe COVID-19 dashboard includes data on city/town COVID-19 activity, confirmed and probable cases of COVID-19, confirmed and probable deaths related to COVID-19, and the demographic characteristics of cases and deaths.
Facebook
TwitterOn March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit: World Health Organization (WHO)For more information, visit the Johns Hopkins Coronavirus Resource Center.COVID-19 Trends MethodologyOur goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.DOI: https://doi.org/10.6084/m9.figshare.125529863/7/2022 - Adjusted the rate of active cases calculation in the U.S. to reflect the rates of serious and severe cases due nearly completely dominant Omicron variant.6/24/2020 - Expanded Case Rates discussion to include fix on 6/23 for calculating active cases.6/22/2020 - Added Executive Summary and Subsequent Outbreaks sectionsRevisions on 6/10/2020 based on updated CDC reporting. This affects the estimate of active cases by revising the average duration of cases with hospital stays downward from 30 days to 25 days. The result shifted 76 U.S. counties out of Epidemic to Spreading trend and no change for national level trends.Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.Correction on 6/1/2020Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020. Revisions added on 4/30/2020 are highlighted.Revisions added on 4/23/2020 are highlighted.Executive SummaryCOVID-19 Trends is a methodology for characterizing the current trend for places during the COVID-19 global pandemic. Each day we assign one of five trends: Emergent, Spreading, Epidemic, Controlled, or End Stage to geographic areas to geographic areas based on the number of new cases, the number of active cases, the total population, and an algorithm (described below) that contextualize the most recent fourteen days with the overall COVID-19 case history. Currently we analyze the countries of the world and the U.S. Counties. The purpose is to give policymakers, citizens, and analysts a fact-based data driven sense for the direction each place is currently going. When a place has the initial cases, they are assigned Emergent, and if that place controls the rate of new cases, they can move directly to Controlled, and even to End Stage in a short time. However, if the reporting or measures to curtail spread are not adequate and significant numbers of new cases continue, they are assigned to Spreading, and in cases where the spread is clearly uncontrolled, Epidemic trend.We analyze the data reported by Johns Hopkins University to produce the trends, and we report the rates of cases, spikes of new cases, the number of days since the last reported case, and number of deaths. We also make adjustments to the assignments based on population so rural areas are not assigned trends based solely on case rates, which can be quite high relative to local populations.Two key factors are not consistently known or available and should be taken into consideration with the assigned trend. First is the amount of resources, e.g., hospital beds, physicians, etc.that are currently available in each area. Second is the number of recoveries, which are often not tested or reported. On the latter, we provide a probable number of active cases based on CDC guidance for the typical duration of mild to severe cases.Reasons for undertaking this work in March of 2020:The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online. Initial older guidance was also obtained online. Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws. Thus, the formula used to compute an estimate of active cases is: Active Cases = 100% of new cases in past 14 days + 19% from past 15-25 days + 5% from past 26-49 days - total deaths. On 3/17/2022, the U.S. calculation was adjusted to: Active Cases = 100% of new cases in past 14 days + 6% from past 15-25 days + 3% from past 26-49 days - total deaths. Sources: https://www.cdc.gov/mmwr/volumes/71/wr/mm7104e4.htm https://covid.cdc.gov/covid-data-tracker/#variant-proportions If a new variant arrives and appears to cause higher rates of serious cases, we will roll back this adjustment. We’ve never been inside a pandemic with the ability to learn of new cases as they are confirmed anywhere in the world. After reviewing epidemiological and pandemic scientific literature, three needs arose. We need to specify which portions of the pandemic lifecycle this map cover. The World Health Organization (WHO) specifies six phases. The source data for this map begins just after the beginning of Phase 5: human to human spread and encompasses Phase 6: pandemic phase. Phase six is only characterized in terms of pre- and post-peak. However, these two phases are after-the-fact analyses and cannot ascertained during the event. Instead, we describe (below) a series of five trends for Phase 6 of the COVID-19 pandemic.Choosing terms to describe the five trends was informed by the scientific literature, particularly the use of epidemic, which signifies uncontrolled spread. The five trends are: Emergent, Spreading, Epidemic, Controlled, and End Stage. Not every locale will experience all five, but all will experience at least three: emergent, controlled, and end stage.This layer presents the current trends for the COVID-19 pandemic by country (or appropriate level). There are five trends:Emergent: Early stages of outbreak. Spreading: Early stages and depending on an administrative area’s capacity, this may represent a manageable rate of spread. Epidemic: Uncontrolled spread. Controlled: Very low levels of new casesEnd Stage: No New cases These trends can be applied at several levels of administration: Local: Ex., City, District or County – a.k.a. Admin level 2State: Ex., State or Province – a.k.a. Admin level 1National: Country – a.k.a. Admin level 0Recommend that at least 100,000 persons be represented by a unit; granted this may not be possible, and then the case rate per 100,000 will become more important.Key Concepts and Basis for Methodology: 10 Total Cases minimum threshold: Empirically, there must be enough cases to constitute an outbreak. Ideally, this would be 5.0 per 100,000, but not every area has a population of 100,000 or more. Ten, or fewer, cases are also relatively less difficult to track and trace to sources. 21 Days of Cases minimum threshold: Empirically based on COVID-19 and would need to be adjusted for any other event. 21 days is also the minimum threshold for analyzing the “tail” of the new cases curve, providing seven cases as the basis for a likely trend (note that 21 days in the tail is preferred). This is the minimum needed to encompass the onset and duration of a normal case (5-7 days plus 10-14 days). Specifically, a median of 5.1 days incubation time, and 11.2 days for 97.5% of cases to incubate. This is also driven by pressure to understand trends and could easily be adjusted to 28 days. Source
Facebook
TwitterCovid-19 Daily metrics at the county level As of 6/1/2023, this data set is no longer being updated. The COVID-19 Data Report is posted on the Open Data Portal every day at 3pm. The report uses data from multiple sources, including external partners; if data from external partners are not received by 3pm, they are not available for inclusion in the report and will not be displayed. Data that are received after 3pm will still be incorporated and published in the next report update. The cumulative number of COVID-19 cases (cumulative_cases) includes all cases of COVID-19 that have ever been reported to DPH. The cumulative number of COVID_19 cases in the last 7 days (cases_7days) only includes cases where the specimen collection date is within the past 7 days. While most cases are reported to DPH within 48 hours of specimen collection, there are a small number of cases that routinely are delayed, and will have specimen collection dates that fall outside of the rolling 7 day reporting window. Additionally, reporting entities may submit correction files to contribute historic data during initial onboarding or to address data quality issues; while this is rare, these correction files may cause a large amount of data from outside of the current reporting window to be uploaded in a single day; this would result in the change in cumulative_cases being much larger than the value of cases_7days. On June 4, 2020, the US Department of Health and Human Services issued guidance requiring the reporting of positive and negative test results for SARS-CoV-2; this guidance expired with the end of the federal PHE on 5/11/2023, and negative SARS-CoV-2 results were removed from the List of Reportable Laboratory Findings. DPH will no longer be reporting metrics that were dependent on the collection of negative test results, specifically total tests performed or percent positivity. Positive antigen and PCR/NAAT results will continue to be reportable.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description Source data: https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.
Only Chicago residents are included based on the home ZIP Code as provided by the medical provider. If a ZIP was missing or was not valid, it is displayed as "Unknown".
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the week the test specimen was collected. For privacy reasons, until a ZIP Code reaches five cumulative cases, both the weekly and cumulative case counts will be blank. Therefore, summing the “Cases - Weekly” column is not a reliable way to determine case totals. Deaths are those that have occurred among cases based on the week of death.
For tests, each test is counted once, based on the week the test specimen was collected. Tests performed prior to 3/1/2020 are not included. Test counts include multiple tests for the same person (a change made on 10/29/2020). PCR and antigen tests reported to Chicago Department of Public Health (CDPH) through electronic lab reporting are included. Electronic lab reporting has taken time to onboard and testing availability has shifted over time, so these counts are likely an underestimate of community infection.
The “Percent Tested Positive” columns are calculated by dividing the number of positive tests by the number of total tests . Because of the data limitations for the Tests columns, such as persons being tested multiple times as a requirement for employment, these percentages may vary in either direction from the actual disease prevalence in the ZIP Code.
All data are provisional and subject to change. Information is updated as additional details are received.
To compare ZIP Codes to Chicago Community Areas, please see http://data.cmap.illinois.gov/opendata/uploads/CKAN/NONCENSUS/ADMINISTRATIVE_POLITICAL_BOUNDARIES/CCAzip.pdf. Both ZIP Codes and Community Areas are also geographic datasets on this data portal.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, Illinois Vital Records, American Community Survey (2018)
Facebook
TwitterAnnouncement Beginning October 20, 2022, CDC will report and publish aggregate case and death data from jurisdictional and state partners on a weekly basis rather than daily. As a result, community transmission levels data reported on data.cdc.gov will be updated weekly on Thursdays, typically by 8 PM ET, instead of daily. This public use dataset has 7 data elements reflecting community transmission levels for all available counties. This dataset contains reported daily transmission level at the county level and contains the same values used to display transmission maps on the COVID Data Tracker. Each day, the dataset is appended to contain the most recent day's data. Transmission level is set to low, moderate, substantial, or high using the calculation rules below. Currently, CDC provides the public with two versions of COVID-19 county-level community transmission level data: this dataset with the levels as originally posted (Originally Posted dataset), updated daily with the most recent day’s data, and an historical dataset with the county-level transmission data from January 1, 2021 (Historical Changes dataset). Methods for calculating county level of community transmission indicator The County Level of Community Transmission indicator uses two metrics: (1) total new COVID-19 cases per 100,000 persons in the last 7 days and (2) percentage of positive SARS-CoV-2 diagnostic nucleic acid amplification tests (NAAT) in the last 7 days. For each of these metrics, CDC classifies transmission values as low, moderate, substantial, or high (below and here). If the values for each of these two metrics differ (e.g., one indicates moderate and the other low), then the higher of the two should be used for decision-making. CDC core metrics of and thresholds for community transmission levels of SARS-CoV-2 Total New Case Rate Metric: "New cases per 100,000 persons in the past 7 days" is calculated by adding the number of new cases in the county (or other administrative level) in the last 7 days divided by the population in the county (or other administrative level) and multiplying by 100,000. "New cases per 100,000 persons in the past 7 days" is considered to have a transmission level of Low (0-9.99); Moderate (10.00-49.99); Substantial (50.00-99.99); and High (greater than or equal to 100.00). Test Percent Positivity Metric: "Percentage of positive NAAT in the past 7 days" is calculated by dividing the number of positive tests in the county (or other administrative level) during the last 7 days by the total number of tests conducted over the last 7 days. "Percentage of positive NAAT in the past 7 days" is considered to have a transmission level of Low (less than 5.00); Moderate (5.00-7.99); Substantial (8.00-9.99); and High (greater than or equal to 10.00). If the two metrics suggest different transmission levels, the higher level is selected. Transmission categories include: Low Transmission Threshold: Counties with fewer than 10 total cases per 100,000 population in the past 7 days, and a NAAT percent test positivity in the past 7 days below 5%; Moderate Transmission Threshold: Counties with 10-49 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 5.0-7.99%; Substantial Transmission Threshold: Counties with 50-99 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 8.0-9.99%; High Transmission Threshold: Counties with 100 or more total cases per 100,000
Facebook
TwitterThis data comes from the New York Times Coronavirus (Covid-19) Data in the United States GitHub repository. They use it to power their interactive page(s) on Covid-19, such as Coronavirus in the U.S.: Latest Map and Case Count.
The primary data published here are the daily cumulative number of cases and deaths reported in each county and state across the U.S. since the beginning of the pandemic. We have also published these additional data sets:
The cumulative & rolling averages for cases and deaths are continually updated, but the more specific data mentioned above for prisons, etc. is no longer being updated.
This includes data at the national, state, and county levels.
If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”
Header Image: https://www.pexels.com/photo/n95-face-mask-3993241/
See the original New York Times source README which is also included in this dataset.
Facebook
TwitterThis case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 32 data element restricted access dataset. The following apply to the public use datasets and the restricted access dataset: - Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. - Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. - Some data are suppressed to protect individual privacy. - Datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensure that time-dependent outcome data are accurately captured. - Datasets are updated monthly. - Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. - For more information about data collection and reporting, please see wwwn.cdc.gov/nndss/data-collection.html. - For more information about the COVID-19 case surveillance data, please see www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html. Overview The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as "immediately notifiable, urgent (within 24 hours)" by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data collected by jurisdictions are shared voluntarily with CDC. For more information, visit: wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-covid-19/case-definition/2020/08/05/. COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by pu
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.
The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.
The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset represents preliminary estimates of cumulative U.S. COVID-19 disease burden for the 2024-2025 period, including illnesses, outpatient visits, hospitalizations, and deaths. The weekly COVID-19-associated burden estimates are preliminary and based on continuously collected surveillance data from patients hospitalized with laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. The data come from the Coronavirus Disease 2019 (COVID-19)-Associated Hospitalization Surveillance Network (COVID-NET), a surveillance platform that captures data from hospitals that serve about 10% of the U.S. population. Each week CDC estimates a range (i.e., lower estimate and an upper estimate) of COVID-19 -associated burden that have occurred since October 1, 2024.
Note: Data are preliminary and subject to change as more data become available. Rates for recent COVID-19-associated hospital admissions are subject to reporting delays; as new data are received each week, previous rates are updated accordingly.
References
Facebook
TwitterNote: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.
On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.
This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.
Previous updates:
On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.
Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.
Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.Using these data, the COVID-19 community level was classified as low, medium, or high.COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.Archived Data Notes:This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflect
Facebook
TwitterNote: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and aut
Facebook
TwitterCollected COVID-19 datasets from various sources as part of DAAN-888 course, Penn State, Spring 2022. Collaborators: Mohamed Abdelgayed, Heather Beckwith, Mayank Sharma, Suradech Kongkiatpaiboon, and Alex Stroud
**1 - COVID-19 Data in the United States ** Source: The data is collected from multiple public health official sources by NY Times journalists and compiled in one single file. Description: Daily count of new COVID-19 cases and deaths for each state. Data is updated daily and runs from 1/21/2020 to 2/4/2022. URL: https://github.com/nytimes/covid-19-data/blob/master/us-states.csv Data size: 38,814 row and 5 columns.
**2 - Mask-Wearing Survey Data ** Source: The New York Times is releasing estimates of mask usage by county in the United States. Description: This data comes from a large number of interviews conducted online by the global data and survey firm Dynata, at the request of The New York Times. The firm asked a question about mask usage to obtain 250,000 survey responses between July 2 and July 14, enough data to provide estimates more detailed than the state level. URL: https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv Data size: 3,142 rows and 6 columns
**3a - Vaccine Data – Global **
Source: This data comes from the US Centers for Disease Control and Prevention (CDC), Our World in Data (OWiD) and the World Health Organization (WHO).
Description: Time series data of vaccine doses administered and the number of fully and partially vaccinated people by country. This data was last updated on February 3, 2022
URL: https://github.com/govex/COVID-19/blob/master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine_global.csv
Data Size: 162,521 rows and 8 columns
**3b -Vaccine Data – United States **
Source: The data is comprised of individual State's public dashboards and data from the US Centers for Disease Control and Prevention (CDC).
Description: Time series data of the total vaccine doses shipped and administered by manufacturer, the dose number (first or second) by state. This data was last updated on February 3, 2022.
URL: https://github.com/govex/COVID-19/blob/master/data_tables/vaccine_data/us_data/time_series/vaccine_data_us_timeline.csv
Data Size: 141,503 rows and 13 columns
**4 - Testing Data **
Source: The data is comprised of individual State's public dashboards and data from the U.S. Department of Health & Human Services.
Description: Time series data of total tests administered by county and state. This data was last updated on January 25, 2022.
URL: https://github.com/govex/COVID-19/blob/master/data_tables/testing_data/county_time_series_covid19_US.csv
Data size: 322,154 rows and 8 columns
**5 – US State and Territorial Public Mask Mandates ** Source: Data from state and territory executive orders, administrative orders, resolutions, and proclamations is gathered from government websites and cataloged and coded by one coder using Microsoft Excel, with quality checking provided by one or more other coders. Description: US State and Territorial Public Mask Mandates from April 10, 2020 through August 15, 2021 by County by Day URL: https://data.cdc.gov/Policy-Surveillance/U-S-State-and-Territorial-Public-Mask-Mandates-Fro/62d6-pm5i Data Size: 1,593,869 rows and 10 columns
**6 – Case Counts & Transmission Level **
Source: This open-source dataset contains seven data items that describe community transmission levels across all counties. This dataset provides the same numbers used to show transmission maps on the COVID Data Tracker and contains reported daily transmission levels at the county level. The dataset is updated every day to include the most current day's data. The calculating procedures below are used to adjust the transmission level to low, moderate, considerable, or high.
Description: US State and County case counts and transmission level from 16-Aug-2021 to 03-Feb-2022
URL: https://data.cdc.gov/Public-Health-Surveillance/United-States-COVID-19-County-Level-of-Community-T/8396-v7yb
Data Size: 550,702 rows and 7 columns
**7 - World Cases & Vaccination Counts **
Source: This is an open-source dataset collected and maintained by Our World in Data. OWID provides research and data to help against the world’s largest problems.
Description: This dataset includes vaccinations, tests & positivity, hospital & ICU, confirmed cases, confirmed deaths, reproduction rate, policy responses and other variables of interest.
URL: https://github.com/owid/covid-19-data/tree/master/public/data
Data Size: 67 columns and 157,000 rows
**8 - COVID-19 Data in the European Union **
Source: This is an open-source dataset collected and maintained by ECDC. It is an EU agency aimed at strengthening Europe's defenses against infectious diseases.
Description: This dataset co...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset: WHO COVID-19 Global Data
This dataset provides comprehensive information on the global COVID-19 pandemic as reported to the World Health Organization (WHO). The dataset is available in comma-separated values (CSV) format and includes the following fields:
Daily cases and deaths by date reported to WHO: WHO-COVID-19-global-data.csv
In addition to the COVID-19 case and death data, this dataset also includes valuable information related to COVID-19 vaccinations. The vaccination data consists of the following fields:
Vaccination Data Fields: vaccination-data.csv
In addition to the vaccination data, a separate dataset containing vaccination metadata is available, including information about vaccine names, product names, company names, authorization dates, start and end dates of vaccine rollout, and more.
Vaccination metadata Fields: vaccination-metadata.csv
Facebook
TwitterNote: Effective 3/31/25, this dataset is no longer being updated.
This dataset includes information on all positive tests of individuals for COVID-19 infection performed in New York State beginning March 1, 2020, when the first case of COVID-19 was identified in the state. The primary goal of publishing this dataset is to provide users timely information about local disease spread and COVID-19 case rates by age group. The data will be updated weekly, reflecting tests reported by 12:00 AM three days prior to the date of the update.
Total positives includes both PCR and antigen positive test results.
Note: This is an updated version of the statewide cases by age dataset that includes all reported cases, both first infections and reinfections. An archived version of the prior dataset, which includes only first infections, is available: https://health.data.ny.gov/d/h8ay-wryy
Facebook
TwitterNote: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.
The California Department of Public Health (CDPH) is identifying vaccination status of COVID-19 cases, hospitalizations, and deaths by analyzing the state immunization registry and registry of confirmed COVID-19 cases. Post-vaccination cases are individuals who have a positive SARS-Cov-2 molecular test (e.g. PCR) at least 14 days after they have completed their primary vaccination series.
Tracking cases of COVID-19 that occur after vaccination is important for monitoring the impact of immunization campaigns. While COVID-19 vaccines are safe and effective, some cases are still expected in persons who have been vaccinated, as no vaccine is 100% effective. For more information, please see https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/Post-Vaccine-COVID19-Cases.aspx
Post-vaccination infection data is updated monthly and includes data on cases, hospitalizations, and deaths among the unvaccinated and the vaccinated. Partially vaccinated individuals are excluded. To account for reporting and processing delays, there is at least a one-month lag in provided data (for example data published on 9/9/22 will include data through 7/31/22).
Notes:
On September 9, 2022, the post-vaccination data has been changed to compare unvaccinated with those with at least a primary series completed for persons age 5+. These data will be updated monthly (first Thursday of the month) and include at least a one month lag.
On February 2, 2022, the post-vaccination data has been changed to distinguish between vaccination with a primary series only versus vaccinated and boosted. The previous dataset has been uploaded as an archived table. Additionally, the lag on this data has been extended to 14 days.
On November 29, 2021, the denominator for calculating vaccine coverage has been changed from age 16+ to age 12+ to reflect new vaccine eligibility criteria. The previous dataset based on age 16+ denominators has been uploaded as an archived table.
Facebook
Twitterhttps://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.