75 datasets found

n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
d
Johns Hopkins COVID-19 Case Tracker
data.world
kaggle.com
csv, zip
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 3, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED
data.cdc.gov
healthdata.gov
+1more
csv, xlsx, xml
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED [Dataset]. https://data.cdc.gov/w/pwn4-m3yp/tdwk-ruhb?cur=mQBYmd4Um4_
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.

Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
A CDC data team reviews and validates the information obtained from jurisdictions’ state and local websites via an overnight data review process.

If more than one official county data source exists, CDC uses a comprehensive data selection process comparing each official county data source, and takes the highest case and death counts respectively, unless otherwise specified by the state.

CDC compiles these data and posts the finalized information on COVID Data Tracker.

County level data is aggregated to obtain state and territory specific totals.

This process is collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provide the most up-to-date numbers on cases and deaths by report date. CDC may retrospectively update counts to correct data quality issues.

Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Source: The current Weekly-Updated Version is based on county-level aggregate count data, while the Archived Version is based on State-level aggregate count data.

Confirmed/Probable Cases/Death breakdown:  While the probable cases and deaths are included in the total case and total death counts in both versions (if applicable), they were reported separately from the confirmed cases and deaths by jurisdiction in the Archived Version.  In the current Weekly-Updated Version, the counts by jurisdiction are not reported by confirmed or probable status (See Confirmed and Probable Counts section for more detail).

Time Series Frequency: The current Weekly-Updated Version contains weekly time series data (i.e., one record per week per jurisdiction), while the Archived Version contains daily time series data (i.e., one record per day per jurisdiction).

Update Frequency: The current Weekly-Updated Version is updated weekly, while the Archived Version was updated twice daily up to October 20, 2022.
Important note: The counts reflected during a given time period in this dataset may not match the counts reflected for the same time period in the archived dataset noted above. Discrepancies may exist due to differences between county and state COVID-19 case surveillance and reconciliation efforts.

Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:

Council of State and Territorial Epidemiologists (ymaws.com).

Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.

Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.

CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:

https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html

https://www.cdc.gov/covid-data-tracker/index.html

https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html

Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.

Archived Data Notes:

November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.

November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.

November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths. 

November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.

December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.

January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.

January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.

January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.

January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.

January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.

January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.

February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.

February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.

February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.

February 16, 2023: Due to a reporting cadence change, Maine’s
COVID-19 Outbreak Data (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Outbreak Data (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-outbreak-data
Explore at:
zip, csv(62919), csv(326192)Available download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This dataset is no longer being updated as of June 2, 2025.

This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.

AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.

LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.

The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.

While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.

Several additional data limitations should be kept in mind:

Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.

Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.

However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.

Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.

The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
COVID-19 Probable Cases (ARCHIVED)
catalog.data.gov
data.chhs.ca.gov
+3more
Updated Nov 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Probable Cases (ARCHIVED) [Dataset]. https://catalog.data.gov/dataset/covid-19-probable-cases-archived-bceb1
Explore at:
Dataset updated
Nov 23, 2025
Dataset provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency. Note: On 2/16/22, 17,467 cases based on at-home positive test results were excluded from the probable case counts. Per national case classification guidelines, cases based on at-home positive results are now classified as “suspect” cases. The majority of these cases were identified between November 2021 and February 2022. CDPH tracks both probable and confirmed cases of COVID-19 to better understand how the virus is impacting our communities. Probable cases are defined as individuals with a positive antigen test that detects the presence of viral antigens. Antigen testing is useful when rapid results are needed, or in settings where laboratory resources may be limited. Confirmed cases are defined as individuals with a positive molecular test, which tests for viral genetic material, such as a PCR or polymerase chain reaction test. Results from both types of tests are reported to CDPH. Due to the expanded use of antigen testing, surveillance of probable cases is increasingly important. The proportion of probable cases among the total cases in California has increased. To provide a more complete picture of trends in case volume, it is now more important to provide probable case data in addition to confirmed case data. The Centers for Disease Control and Prevention (CDC) has begun publishing probable case data for states. Testing data is updated weekly. Due to small numbers, the percentage of probable cases in the first two weeks of the month may change. Probable case data from San Diego County is not included in the statewide table at this time. For more information, please see https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/Probable-Cases.aspx
Novel Covid-19 Dataset
kaggle.com
Updated Sep 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GHOST5612
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context:

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited:

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Here’s a polished version suitable for a professional Kaggle dataset description:

Dataset Description

This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

Files and Columns

1. covid_19_data.csv (Main File)

This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

Sno – Serial number of the record

ObservationDate – Date of the observation (MM/DD/YYYY)

Province/State – Province or state of the observation (may be missing for some entries)

Country/Region – Country of the observation

Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)

Confirmed – Cumulative number of confirmed cases on that date

Deaths – Cumulative number of deaths on that date

Recovered – Cumulative number of recoveries on that date

2. 2019_ncov_data.csv (Legacy File)

This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

3. COVID_open_line_list_data.csv

This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

4. COVID19_line_list_data.csv

Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

✅ Use covid_19_data.csv for up-to-date aggregated global trends.

✅ Use the line list datasets for detailed, individual-level case analysis.

Country level datasets:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:

India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

Acknowledgements :

Johns Hopkins University for making the data available for educational and academic research purposes

MoBS lab - https://www.mobs-lab.org/2019ncov.html

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google....
COVID-19 State Profile Report - Michigan
healthdata.gov
data.virginia.gov
+4more
csv, xlsx, xml
Updated Jan 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
White House COVID-19 Team, Joint Coordination Cell, Data Strategy and Execution Workgroup (2021). COVID-19 State Profile Report - Michigan [Dataset]. https://healthdata.gov/Community/COVID-19-State-Profile-Report-Michigan/s8hn-gz3c
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Jan 27, 2021
Dataset authored and provided by
White House COVID-19 Team, Joint Coordination Cell, Data Strategy and Execution Workgroup
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
After over two years of public reporting, the State Profile Report will no longer be produced and distributed after February 2023. The final release was on February 23, 2023. We want to thank everyone who contributed to the design, production, and review of this report and we hope that it provided insight into the data trends throughout the COVID-19 pandemic. Data about COVID-19 will continue to be updated at CDC’s COVID Data Tracker.

The State Profile Report (SPR) is generated by the Data Strategy and Execution Workgroup in the Joint Coordination Cell, in collaboration with the White House. It is managed by an interagency team with representatives from multiple agencies and offices (including the United States Department of Health and Human Services (HHS), the Centers for Disease Control and Prevention, the HHS Assistant Secretary for Preparedness and Response, and the Indian Health Service). The SPR provides easily interpretable information on key indicators for each state, down to the county level.

It is a weekly snapshot in time that:

Focuses on recent outcomes in the last seven days and changes relative to the month prior

Provides additional contextual information at the county level for each state, and includes national level information

Supports rapid visual interpretation of results with color thresholds
g
Coronavirus COVID-19 Global Cases by the Center for Systems Science and...
github.com
systems.jhu.edu
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) [Dataset]. https://github.com/CSSEGISandData/COVID-19
Explore at:
Dataset provided by
Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
Area covered
Global
Description
2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Confirmed Cases by Country/Region/Sovereignty
Confirmed Cases by Province/State/Dependency
Deaths
Recovered
Downloadable data:
https://github.com/CSSEGISandData/COVID-19
Additional Information about the Visual Dashboard:
https://systems.jhu.edu/research/public-health/ncov
USA-statewise- Covid-19-cases
kaggle.com
zip
Updated Jul 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ayoub chaoui (2021). USA-statewise- Covid-19-cases [Dataset]. https://www.kaggle.com/datasets/ayoubchaoui/usastatewise-covid19cases
Explore at:
zip(27499 bytes)Available download formats
Dataset updated
Jul 14, 2021
Authors
ayoub chaoui
License
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Area covered
United States
Description
The Covid-19 curve in the United States is rising again after months of decline, with the number of new cases per day doubling over the past three weeks, driven by the fast-spreading Delta variant, lagging vaccination rates, and Fourth of July gatherings

In the United States of America, from 3 January 2020 to 5:05 pm CEST, 14 July 2021, there have been 33,572,715 confirmed cases of COVID-19 with 602,409 deaths, reported to WHO. As of 9 July 2021, a total of 334,282,915 vaccine doses have been administered.

Content

This Column is a resource to help advance the understanding of the virus all-state in the USA
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
University of Memphis, USA
Independent University, Bangladesh
Silicon Orchard Lab, Bangladesh
Authors
Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
c
The COVID Tracking Project
covidtracking.com
google sheets
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The COVID Tracking Project [Dataset]. https://covidtracking.com/
Explore at:
google sheetsAvailable download formats
Description
The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.
Testing is a crucial part of any public health response, and sharing test data is essential to understanding this outbreak. The CDC is currently not publishing complete testing data, so we’re doing our best to collect it from each state and provide it to the public. The information is patchy and inconsistent, so we’re being transparent about what we find and how we handle it—the spreadsheet includes our live comments about changing data and how we’re working with incomplete information.
From here, you can also learn about our methodology, see who makes this, and find out what information states provide and how we handle it.
D
[Archived] COVID-19 Deaths by Population Characteristics Over Time
data.sfgov.org
healthdata.gov
+1more
csv, xlsx, xml
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). [Archived] COVID-19 Deaths by Population Characteristics Over Time [Dataset]. https://data.sfgov.org/Health-and-Social-Services/-Archived-COVID-19-Deaths-by-Population-Characteri/kkr3-wq7h
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Jun 27, 2024
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.

A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.

Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.

B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.

Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates

Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.

To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.

Data notes on each population characteristic type is listed below.

Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.

Gender * The City collects information on gender identity using these guidelines.

C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.

Dataset will not update on the business day following any federal holiday.

D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.

New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.

This data may not be immediately available for more recent deaths. Data updates as more information becomes available.

To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.

E. CHANGE LOG
9/11/2023 - on this date, we began using an updated definition of a COVID-19 death to align with the California Department of Public Health. This change was applied to COVID-19 deaths retrospectively beginning on 1/1/2023. More information about the recommendation by the Council of State and Territorial Epidemiologists that motivated this change can be found https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">here.
6/6/2023 - data on deaths by transmission type have been removed. See section ARCHIVED DATA for more detail.
5/16/2023 - data on deaths by sexual orientation, comorbidities, homelessness, and single room occupancy have been removed. See section ARCHIVED DATA for more detail.
4/6/2023 - the State implemented system updates to improve the integrity of historical data.
1/31/2023 - column “population_estimate” added.
3/23/2022 - ‘Native American’ changed to ‘American Indian or Alaska Native’ to align with the census.
1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.
COVID-19 State Profile Report - Florida
healthdata.gov
data.virginia.gov
+4more
csv, xlsx, xml
Updated Jan 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
White House COVID-19 Team, Joint Coordination Cell, Data Strategy and Execution Workgroup (2021). COVID-19 State Profile Report - Florida [Dataset]. https://healthdata.gov/Community/COVID-19-State-Profile-Report-Florida/ht94-9tjc
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Jan 27, 2021
Dataset authored and provided by
White House COVID-19 Team, Joint Coordination Cell, Data Strategy and Execution Workgroup
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
Florida
Description
After over two years of public reporting, the State Profile Report will no longer be produced and distributed after February 2023. The final release was on February 23, 2023. We want to thank everyone who contributed to the design, production, and review of this report and we hope that it provided insight into the data trends throughout the COVID-19 pandemic. Data about COVID-19 will continue to be updated at CDC’s COVID Data Tracker.

The State Profile Report (SPR) is generated by the Data Strategy and Execution Workgroup in the Joint Coordination Cell, in collaboration with the White House. It is managed by an interagency team with representatives from multiple agencies and offices (including the United States Department of Health and Human Services (HHS), the Centers for Disease Control and Prevention, the HHS Assistant Secretary for Preparedness and Response, and the Indian Health Service). The SPR provides easily interpretable information on key indicators for each state, down to the county level.

It is a weekly snapshot in time that:

Focuses on recent outcomes in the last seven days and changes relative to the month prior

Provides additional contextual information at the county level for each state, and includes national level information

Supports rapid visual interpretation of results with color thresholds
Covid-19 USA dataset (21/01/2020 to 25/07/2020)
kaggle.com
zip
Updated Jul 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soumya S. Acharya (2020). Covid-19 USA dataset (21/01/2020 to 25/07/2020) [Dataset]. https://www.kaggle.com/soumyasacharya/covid19-usa-dataset-21012020-to-25072020
Explore at:
zip(3693723 bytes)Available download formats
Dataset updated
Jul 26, 2020
Authors
Soumya S. Acharya
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
United States
Description
Data is obtained from COVID-19 Tracking project and NYTimes. Sincere thanks to them for making it available to the public.

Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - World Health Organization

The number of new cases are increasing day by day around the world. This dataset has information from 50 US states and the District of Columbia at daily level.

LICENSE:

Please refer here

Apache License 2.0

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. For counties dataset, please refer here

Content us_states_covid19_daily.csv

This dataset has number of tests conducted in each state at daily level. Column descriptions are

date - date of observation state - US state 2 digit code positive - number of tests with positive results negative - number of tests with negative results pending - number of test with pending results death - number of deaths total - total number of tests

Acknowledgements Sincere thanks to COVID-19 Tracking project from which the data is obtained.

Sincere thanks to NYTimes for the counties dataset

There is a nice tableau public dashboard on the data. Images for this dataset is obtained from the same. Thank you.

Inspiration Some of the questions that could be answered are

How is the spread over time to various states Change in number of people tested over time
m
COVID-19 reporting
mass.gov
Updated Mar 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Executive Office of Health and Human Services (2020). COVID-19 reporting [Dataset]. https://www.mass.gov/info-details/covid-19-reporting
Explore at:
Dataset updated
Mar 4, 2020
Dataset provided by
Department of Public Health
Executive Office of Health and Human Services
Area covered
Massachusetts
Description
The COVID-19 dashboard includes data on city/town COVID-19 activity, confirmed and probable cases of COVID-19, confirmed and probable deaths related to COVID-19, and the demographic characteristics of cases and deaths.
COVID-19 Dataset for California Counties
kaggle.com
zip
Updated Apr 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AdityaVipradas (2020). COVID-19 Dataset for California Counties [Dataset]. https://www.kaggle.com/adityavipradas/covid19-dataset-for-california-counties
Explore at:
zip(32276 bytes)Available download formats
Dataset updated
Apr 5, 2020
Authors
AdityaVipradas
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
California
Description
Context

COVID-19 is on a rise worldwide. It was first identified in the city of Wuhan in China in 2019 and has now spread into a global pandemic. California is currently the fourth largest affected state in USA. The state's confirmed cases have been on a rise since early March 2020 due to more testing capabilities. In this dire time, it is extremely important to understand the factors affecting the spread of the virus in California, identify susceptible population and predict the trajectory of the infected and dead cases on a daily basis.

Content

Update: 4 April 2020, 7:27 PM Pacific Time (PT)

This data contains information about confirmed cases (13927) and fatalities (321) due to COVID-19 in 58 California counties along with instructions provided by health agencies in all counties. A breakdown of confirmed cases in the cities of California is also provided. The information has been sourced from Los Angeles Times.

As mentioned by LA Times, "The tallies here are mostly limited to residents of California, which is the standard method used to count patients by the state’s health authorities. Those totals do not include people from other states who are quarantined here, such as the passengers and crew of the Grand Princess cruise ship that docked in Oakland."

Acknowledgements

LA Times - https://www.latimes.com/projects/california-coronavirus-cases-tracking-outbreak/

Inspiration

This dataset will be useful in understanding and predicting the trajectory of the infected and dead cases in California in the coming days.

It might also be useful for COVID19 Local US-CA Forecasting (Week 1) competition

The dataset can also highlight any need to update any health agency instructions to take further precautionary measures and save lives.

Please consider upvoting if the data is found useful in any way. If there are any improvement suggestions, do let me know.
COVID-19 Deaths Over Time
healthdata.gov
data.sfgov.org
+2more
csv, xlsx, xml
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). COVID-19 Deaths Over Time [Dataset]. https://healthdata.gov/dataset/COVID-19-Deaths-Over-Time/mjyp-v9dd
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
data.sfgov.org
Description
A. SUMMARY This dataset represents San Francisco COVID-19 related deaths by day. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.

B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death data is provided by the California Department of Public Health.

It takes time to process this data. Because of this, death totals may increase or decrease over time.

Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.

C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.

Dataset will not update on the business day following any federal holiday.

D. HOW TO USE THIS DATASET This dataset shows new deaths and cumulative deaths by date of death. New deaths are the count of deaths on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths up to the date listed.

Use the Deaths by Population Characteristics Over Time dataset to see deaths by different subgroups including race/ethnicity, age, and gender.

E. CHANGE LOG
9/11/2023 – on this date, we began using an updated definition of a COVID-19 death to align with the California Department of Public Health. This change was applied to COVID-19 deaths retrospectively beginning on 1/1/2023. More information about the recommendation by the Council of State and Territorial Epidemiologists that motivated this change can be found https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">here.
4/6/2023 - the State implemented system updates to improve the integrity of historical data.
1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.
COVID 19 Testing Tracking in States in USA
kaggle.com
zip
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeBreaker619 (2021). COVID 19 Testing Tracking in States in USA [Dataset]. https://www.kaggle.com/codebreaker619/cvoid-19-testing-tracking-in-states
Explore at:
zip(918578 bytes)Available download formats
Dataset updated
Jan 11, 2021
Authors
CodeBreaker619
Area covered
United States
Description
Content

More precise total testing counts. Previously, total testing had been represented by positive tests plus negative tests. As states are beginning to report more specific testing counts, The project is moving toward reporting those numbers directly.

This may make it more difficult to compare your state against others in terms of positivity rate, but the net effect is we now have more precise counts:

Total Test Encounters: Total tests increase by one for every individual that is tested that day. Additional tests for that individual on that day (i.e., multiple swabs taken at the same time) are not included

Total PCR Specimens: Total tests increase by one for every testing sample retrieved from an individual. Multiple samples from an individual on a single day can be included in the count

Unique People Tested: Total tests increase by one the first time an individual is tested. The count will not increase in later days if that individual is tested again – even months later

These three totals are not all available for every state. The COVID Tracking Project prioritizes the different count types for each state in this order:

Total Test Encounters

Total PCR Specimens

Unique People Tested

If the state does not provide any of those totals directly, The COVID Tracking Project falls back to the initial calculation of total tests that it has provided up to this point: positive + negative tests.

One of the above total counts will be the number present in the cumulative_total_test_results and total_test_results_increase columns.

The positivity rates provided on this site will divide confirmed cases by one of these total_test_results columns.

Due to these changes, we advise comparing positivity rates between states only if the states being compared have the same type of total test count.
COVID-19 Outbreak Data
healthdata.gov
datasets.ai
csv, xlsx, xml
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chhs.data.ca.gov (2025). COVID-19 Outbreak Data [Dataset]. https://healthdata.gov/d/qvh7-ed62
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
chhs.data.ca.gov
Description
This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.

AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.

LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.

The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.

While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.

Several additional data limitations should be kept in mind:

Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.

Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.

However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.

Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.

The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
Weekly United States COVID-19 Hospitalization Metrics by County (Historical)...
data.cdc.gov
data.virginia.gov
+1more
csv, xlsx, xml
Updated Jan 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Division of Healthcare Quality Promotion (DHQP) Surveillance Branch, National Healthcare Safety Network (NHSN) (2025). Weekly United States COVID-19 Hospitalization Metrics by County (Historical) – ARCHIVED [Dataset]. https://data.cdc.gov/widgets/82ci-krud?mobile_redirect=true
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Jan 17, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Division of Healthcare Quality Promotion (DHQP) Surveillance Branch, National Healthcare Safety Network (NHSN)
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Note: After May 3, 2024, this dataset will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, hospital capacity, or occupancy data to HHS through CDC’s National Healthcare Safety Network (NHSN). The related CDC COVID Data Tracker site was revised or retired on May 10, 2023.

Note: May 3,2024: Due to incomplete or missing hospital data received for the April 21,2024 through April 27, 2024 reporting period, the COVID-19 Hospital Admissions Level could not be calculated for CNMI and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on May 3, 2024.

This dataset represents COVID-19 hospitalization data and metrics aggregated to county or county-equivalent, for all counties or county-equivalents (including territories) in the United States as of the initial date of reporting for each weekly metric. COVID-19 hospitalization data are reported to CDC’s National Healthcare Safety Network, which monitors national and local trends in healthcare system stress, capacity, and community disease levels for approximately 6,000 hospitals in the United States. Data reported by hospitals to NHSN and included in this dataset represent aggregated counts and include metrics capturing information specific to COVID-19 hospital admissions, and inpatient and ICU bed capacity occupancy.

Reporting information:
As of December 15, 2022, COVID-19 hospital data are required to be reported to NHSN, which monitors national and local trends in healthcare system stress, capacity, and community disease levels for approximately 6,000 hospitals in the United States. Data reported by hospitals to NHSN represent aggregated counts and include metrics capturing information specific to hospital capacity, occupancy, hospitalizations, and admissions. Prior to December 15, 2022, hospitals reported data directly to the U.S. Department of Health and Human Services (HHS) or via a state submission for collection in the HHS Unified Hospital Data Surveillance System (UHDSS).
While CDC reviews these data for errors and corrects those found, some reporting errors might still exist within the data. To minimize errors and inconsistencies in data reported, CDC removes outliers before calculating the metrics. CDC and partners work with reporters to correct these errors and update the data in subsequent weeks.
Many hospital subtypes, including acute care and critical access hospitals, as well as Veterans Administration, Defense Health Agency, and Indian Health Service hospitals, are included in the metric calculations provided in this report. Psychiatric, rehabilitation, and religious non-medical hospital types are excluded from calculations.
Data are aggregated and displayed for hospitals with the same Centers for Medicare and Medicaid Services (CMS) Certification Number (CCN), which are assigned by CMS to counties based on the CMS Provider of Services files.

Full details on COVID-19 hospital data reporting guidance can be found here: https://www.hhs.gov/sites/default/files/covid-19-faqs-hospitals-hospital-laboratory-acute-care-facility-data-reporting.pdf
Calculation of county-level hospital metrics:
County-level hospital data are derived using calculations performed at the Health Service Area (HSA) level. An HSA is defined by CDC’s National Center for Health Statistics as a geographic area containing at least one county which is self-contained with respect to the population’s provision of routine hospital care. Every county in the United States is assigned to an HSA, and each HSA must contain at least one hospital. Therefore, use of HSAs in the calculation of local hospital metrics allows for more accurate characterization of the relationship between health care utilization and health status at the local level.
Data presented at the county-level represent admissions, hospital inpatient and ICU bed capacity and occupancy among hospitals within the selected HSA. Therefore, admissions, capacity, and occupancy are not limited to residents of the selected HSA.
For all county-level hospital metrics listed below the values are calculated first for the entire HSA, and then the HSA-level value is then applied to each county within the HSA.
For all county-level hospital metrics listed below the values are calculated first for the entire HSA, and then the HSA-level value is then applied to each county within the HSA.
Metric details:
Time period: data for the previous MMWR week (Sunday-Saturday) will update weekly on Mondays as soon as they are reviewed and verified, usually before 8 pm ET. Updates will occur the following day when reporting coincides with a federal holiday. Note: Weekly updates might be delayed due to delays in reporting. All data are provisional. Because these provisional counts are subject to change, including updates to data reported previously, adjustments can occur. Data may be updated since original publication due to delays in reporting (to account for data received after a given Thursday publication) or data quality corrections.
New hospital admissions (count): Total number of admissions of patients with laboratory-confirmed COVID-19 in the previous week (including both adult and pediatric admissions) in the entire jurisdiction
New Hospital Admissions Rate Value (Admissions per 100k): Total number of new admissions of patients with laboratory-confirmed COVID-19 in the past week (including both adult and pediatric admissions) for the entire jurisdiction divided by 2019 intercensal population estimate for that jurisdiction multiplied by 100,000. (Note: This metric is used to determine each county’s COVID-19 Hospital Admissions Level for a given week).
New COVID-19 Hospital Admissions Rate Level: qualitative value of new COVID-19 hospital admissions rate level [Low, Medium, High, Insufficient Data]
New hospital admissions percent change from prior week: Percent change in the current weekly total new admissions of patients with laboratory-confirmed COVID-19 per 100,000 population compared with the prior week.
New hospital admissions percent change from prior week level: Qualitative value of percent change in hospital admissions rate from prior week [Substantial decrease, Moderate decrease, Stable, Moderate increase, Substantial increase, Insufficient data]
COVID-19 Inpatient Bed Occupancy Value: Percentage of all staffed inpatient beds occupied by patients with laboratory-confirmed COVID-19 (including both adult and pediatric patients) within the in the entire jurisdiction is calculated as an average of valid daily values within the past week (e.g., if only three valid values, the average of those three is taken). Averages are separately calculated for the daily numerators (patients hospitalized with confirmed COVID-19) and denominators (staffed inpatient beds). The average percentage can then be taken as the ratio of these two values for the entire jurisdiction.
COVID-19 Inpatient Bed Occupancy Level: Qualitative value of inpatient beds occupied by COVID-19 patients level [Minimal, Low, Moderate, Substantial, High, Insufficient data]
COVID-19 Inpatient Bed Occupancy percent change from prior week: The absolute change in the percent of staffed inpatient beds occupied by patients with laboratory-confirmed COVID-19 represents the week-over-week absolute difference between the average occupancy of patients with confirmed COVID-19 in staffed inpatient beds in the past week, compared with the prior week, in the entire jurisdiction.
COVID-19 ICU Bed Occupancy Value: Percentage of all staffed inpatient beds occupied by adult patients with confirmed COVID-19 within the entire jurisdiction is calculated as an average of valid daily values within the past week (e.g., if only three valid values, the average of those three is taken). Averages are separately calculated for the daily numerators (adult patients hospitalized with confirmed COVID-19) and denominators (staffed adult ICU beds). The average percentage can then be taken as the ratio of these two values for the entire jurisdiction.
COVID-19 ICU Bed Occupancy Level: Qualitative value of ICU beds occupied by COVID-19 patients level [Minimal, Low, Moderate, Substantial, High, Insufficient data]
COVID-19 ICU Bed Occupancy percent change from prior week: The absolute change in the percent of staffed ICU beds occupied by patients with laboratory-confirmed COVID-19 represents the week-over-week absolute difference between the average occupancy of patients with confirmed COVID-19 in staffed adult ICU beds for the past week, compared with the prior week, in the in the entire jurisdiction.
For all metrics, if there are no data in the specified locality for a given week, the metric value is displayed as “insufficient data”.

Notes: June 15, 2023: Due to incomplete or missing hospital data received for the June 4, 2023, through June 10, 2023, reporting period, the COVID-19 Hospital Admissions Level could not be calculated for CNMI and AS and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on June 15, 2023.

July 10, 2023: Due to incomplete or missing hospital data received for the June 25, 2023, through July 1, 2023, reporting period, the COVID-19 Hospital Admissions Level could not be calculated for CNMI and AS and will be reported as “NA” or “Not Available” in the COVID-19 Hospital Admissions Level data released on July 10, 2023.

July 17, 2023: Due to incomplete or missing hospital data received for the July 2, 2023, through July 8, 2023, reporting

Facebook

Twitter

Click to copy link

Link copied

Cite

New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:

Dataset provided by

New York Times

Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Clear search

Close search

Google apps

Main menu

Coronavirus (Covid-19) Data in the United States

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED

COVID-19 Outbreak Data (ARCHIVED)

COVID-19 Probable Cases (ARCHIVED)

Novel Covid-19 Dataset

Context:

Edited:

Content

Dataset Description

Files and Columns

1. covid_19_data.csv (Main File)

2. 2019_ncov_data.csv (Legacy File)

3. COVID_open_line_list_data.csv

4. COVID19_line_list_data.csv

Country level datasets:

Acknowledgements :

COVID-19 State Profile Report - Michigan

Coronavirus COVID-19 Global Cases by the Center for Systems Science and...

USA-statewise- Covid-19-cases

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

The COVID Tracking Project

[Archived] COVID-19 Deaths by Population Characteristics Over Time

COVID-19 State Profile Report - Florida

Covid-19 USA dataset (21/01/2020 to 25/07/2020)

COVID-19 reporting

COVID-19 Dataset for California Counties

Context

Content

Acknowledgements

Inspiration

COVID-19 Deaths Over Time

COVID 19 Testing Tracking in States in USA

Content

COVID-19 Outbreak Data

Weekly United States COVID-19 Hospitalization Metrics by County (Historical)...

Coronavirus (Covid-19) Data in the United States