Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (to
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. The datasets mainly focus on two major fields first one is First Case which consists of information of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the other one First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.
*This is not a country. This is a ship. The name of the Cruise Ship was not given from the government.
"N+": the age is not specified but greater than N
“No Trace”: some data was not found
“Unspecified”: not available from the authority
“N/A”: for “Last Visited Country(s) of Confirmed Case(s)” column, “N/A” indicates that the confirmed case(s) of those countries do not have any travel history in recent past; in “Age of First Death(s)” column “N/A” indicates that those countries do not have may death case till May 16, 2020.
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics Data are reported daily, with
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario
This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group.
Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool
Data includes:
As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm.
As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category.
On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023.
CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags.
The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON.
“Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results.
Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts.
Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different.
Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported.
Rates for the most recent days are subject to reporting lags
All data reflects totals from 8 p.m. the previous day.
This dataset is subject to change.
Reporting of Aggregate Case and Death Count data was discontinued on May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
The surveillance case definition for COVID-19, a nationally notifiable disease, was first described in a position statement from the Council for State and Territorial Epidemiologists, which was later revised. However, there is some variation in how jurisdictions implemented these case definitions. More information on how CDC collects COVID-19 case surveillance data can be found at FAQ: COVID-19 Data and Surveillance.
Aggregate Data Collection Process Since the beginning of the COVID-19 pandemic, data were reported from state and local health departments through a robust process with the following steps:
This process was collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provided the most up-to-date numbers on cases and deaths by report date. Throughout data collection, CDC retrospectively updated counts to correct known data quality issues.
Description This archived public use dataset focuses on the cumulative and weekly case and death rates per 100,000 persons within various sociodemographic factors across all states and their counties. All resulting data are expressed as rates calculated as the number of cases or deaths per 100,000 persons in counties meeting various classification criteria using the US Census Bureau Population Estimates Program (2019 Vintage).
Each county within jurisdictions is classified into multiple categories for each factor. All rates in this dataset are based on classification of counties by the characteristics of their population, not individual-level factors. This applies to each of the available factors observed in this dataset. Specific factors and their corresponding categories are detailed below.
Population-level factors Each unique population factor is detailed below. Please note that the “Classification” column describes each of the 12 factors in the dataset, including a data dictionary describing what each numeric digit means within each classification. The “Category” column uses numeric digits (2-6, depending on the factor) defined in the “Classification” column.
Metro vs. Non-Metro – “Metro_Rural” Metro vs. Non-Metro classification type is an aggregation of the 6 National Center for Health Statistics (NCHS) Urban-Rural classifications, where “Metro” counties include Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro areas and “Non-Metro” counties include Micropolitan and Non-Core (Rural) areas. 1 – Metro, including “Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro” areas 2 – Non-Metro, including “Micropolitan, and Non-Core” areas
Urban/rural - “NCHS_Class” Urban/rural classification type is based on the 2013 National Center for Health Statistics Urban-Rural Classification Scheme for Counties. Levels consist of:
1 Large Central Metro
2 Large Fringe Metro
3 Medium Metro
4 Small Metro
5 Micropolitan
6 Non-Core (Rural)
American Community Survey (ACS) data were used to classify counties based on their age, race/ethnicity, household size, poverty level, and health insurance status distributions. Cut points were generated by using tertiles and categorized as High, Moderate, and Low percentages. The classification “Percent non-Hispanic, Native Hawaiian/Pacific Islander” is only available for “Hawaii” due to low numbers in this category for other available locations. This limitation also applies to other race/ethnicity categories within certain jurisdictions, where 0 counties fall into the certain category. The cut points for each ACS category are further detailed below:
Age 65 - “Age65”
1 Low (0-24.4%) 2 Moderate (>24.4%-28.6%) 3 High (>28.6%)
Non-Hispanic, Asian - “NHAA”
1 Low (<=5.7%) 2 Moderate (>5.7%-17.4%) 3 High (>17.4%)
Non-Hispanic, American Indian/Alaskan Native - “NHIA”
1 Low (<=0.7%) 2 Moderate (>0.7%-30.1%) 3 High (>30.1%)
Non-Hispanic, Black - “NHBA”
1 Low (<=2.5%) 2 Moderate (>2.5%-37%) 3 High (>37%)
Hispanic - “HISP”
1 Low (<=18.3%) 2 Moderate (>18.3%-45.5%) 3 High (>45.5%)
Population in Poverty - “Pov”
1 Low (0-12.3%) 2 Moderate (>12.3%-17.3%) 3 High (>17.3%)
Population Uninsured- “Unins”
1 Low (0-7.1%) 2 Moderate (>7.1%-11.4%) 3 High (>11.4%)
Average Household Size - “HH”
1 Low (1-2.4) 2 Moderate (>2.4-2.6) 3 High (>2.6)
Community Vulnerability Index Value - “CCVI” COVID-19 Community Vulnerability Index (CCVI) scores are from Surgo Ventures, which range from 0 to 1, were generated based on tertiles and categorized as:
1 Low Vulnerability (0.0-0.4) 2 Moderate Vulnerability (0.4-0.6) 3 High Vulnerability (0.6-1.0)
Social Vulnerability Index Value – “SVI" Social Vulnerability Index (SVI) scores (vintage 2020), which also range from 0 to 1, are from CDC/ASTDR’s Geospatial Research, Analysis & Service Program. Cut points for CCVI and SVI scores were generated based on tertiles and categorized as:
1 Low Vulnerability (0-0.333) 2 Moderate Vulnerability (0.334-0.666) 3 High Vulnerability (0.667-1)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Note: Data elements were retired from HERDS on 10/6/23 and this dataset was archived.
This dataset includes the cumulative number and percent of healthcare facility-reported fatalities for patients with lab-confirmed COVID-19 disease by reporting date and age group. This dataset does not include fatalities related to COVID-19 disease that did not occur at a hospital, nursing home, or adult care facility. The primary goal of publishing this dataset is to provide users with information about healthcare facility fatalities among patients with lab-confirmed COVID-19 disease.
The information in this dataset is also updated daily on the NYS COVID-19 Tracker at https://www.ny.gov/covid-19tracker.
The data source for this dataset is the daily COVID-19 survey through the New York State Department of Health (NYSDOH) Health Electronic Response Data System (HERDS). Hospitals, nursing homes, and adult care facilities are required to complete this survey daily. The information from the survey is used for statewide surveillance, planning, resource allocation, and emergency response activities. Hospitals began reporting for the HERDS COVID-19 survey in March 2020, while Nursing Homes and Adult Care Facilities began reporting in April 2020. It is important to note that fatalities related to COVID-19 disease that occurred prior to the first publication dates are also included.
The fatality numbers in this dataset are calculated by assigning age groups to each patient based on the patient age, then summing the patient fatalities within each age group, as of each reporting date. The statewide total fatality numbers are calculated by summing the number of fatalities across all age groups, by reporting date. The fatality percentages are calculated by dividing the number of fatalities in each age group by the statewide total number of fatalities, by reporting date. The fatality numbers represent the cumulative number of fatalities that have been reported as of each reporting date.
This dataset shows daily confirmed and probable cases of COVID-19 in New York City by date of specimen collection. Total cases has been calculated as the sum of daily confirmed and probable cases. Seven-day averages of confirmed, probable, and total cases are also included in the dataset. A person is classified as a confirmed COVID-19 case if they test positive with a nucleic acid amplification test (NAAT, also known as a molecular test; e.g. a PCR test). A probable case is a person who meets the following criteria with no positive molecular test on record: a) test positive with an antigen test, b) have symptoms and an exposure to a confirmed COVID-19 case, or c) died and their cause of death is listed as COVID-19 or similar. As of June 9, 2021, people who meet the definition of a confirmed or probable COVID-19 case >90 days after a previous positive test (date of first positive test) or probable COVID-19 onset date will be counted as a new case. Prior to June 9, 2021, new cases were counted ≥365 days after the first date of specimen collection or clinical diagnosis. Any person with a residence outside of NYC is not included in counts. Data is sourced from electronic laboratory reporting from the New York State Electronic Clinical Laboratory Reporting System to the NYC Health Department. All identifying health information is excluded from the dataset.
These data are used to evaluate the overall number of confirmed and probable cases by day (seven day average) to track the trajectory of the pandemic. Cases are classified by the date that the case occurred. NYC COVID-19 data include people who live in NYC. Any person with a residence outside of NYC is not included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States recorded 1127152 Coronavirus Deaths since the epidemic began, according to the World Health Organization (WHO). In addition, United States reported 103436829 Coronavirus Cases. This dataset includes a chart with historical data for the United States Coronavirus Deaths.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">
People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.
For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.
India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.
https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">
Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source
The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.
The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters
The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.
And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Effective June 28, 2023, this dataset will no longer be updated. Similar data are accessible from CDC WONDER (https://wonder.cdc.gov/mcd-icd10-provisional.html).
Deaths involving coronavirus disease 2019 (COVID-19) with a focus on ages 0-18 years in the United States.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
When I was searching for COVID-19 datasets online, I soon realized that there were no comprehensive datasets of the United States on a county level basis which included social, economic, and demographic factors in addition to the general case information that was already available on several sites. To quench my thirst for clean and relevant data, I proceeded to gather information from several various sources to compile the dataset I was looking for.
I started by looking for a reliable dataset that has general information such as confirmed cases, deaths, etc. I found John Hopkin's COVID-19 dataset to be the best one for this purpose as it is well organized and updated daily. Then, I set out looking for economic factors and population data for each county in the United States. I found a collection of such files compiled by the Economic Research Service branch of the USDA on their website. Finally, I had to find a dataset which had racial and demographic information for each county, which I found on the US Census Bureau's website under a page which was dedicated to county population data by several characteristics. Now that I had all the data I was looking for, I proceeded to find which counties were common in all datasets. After several hours of cleaning each dataset and extracting relevant information, I combined all the information into one CSV file with 2959 counties of clean information - exactly what I was looking for.
I hope that the Kaggle community will use this dataset to answer important questions regarding COVID-19 in the United States and the role that external economic, social, and demographic factors play in the shaping of the pandemic. I know that there are several patterns to be discovered and I sincerely hope that this helps our community understand just a little more about the pandemic than we do right now.
https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherschemehttps://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherscheme
The Public Health Research Database (PHRD) is a linked asset which currently includes Census 2011 data; Mortality Data; Hospital Episode Statistics (HES); GP Extraction Service (GPES) Data for Pandemic Planning and Research data. Researchers may apply for these datasets individually or any combination of the current 4 datasets.
The purpose of this dataset is to enable analysis of deaths involving COVID-19 by multiple factors such as ethnicity, religion, disability and known comorbidities as well as age, sex, socioeconomic and marital status at subnational levels. 2011 Census data for usual residents of England and Wales, who were not known to have died by 1 January 2020, linked to death registrations for deaths registered between 1 January 2020 and 8 March 2021 on NHS number. The data exclude individuals who entered the UK in the year before the Census took place (due to their high propensity to have left the UK prior to the study period), and those over 100 years of age at the time of the Census, even if their death was not linked. The dataset contains all individuals who died (any cause) during the study period, and a 5% simple random sample of those still alive at the end of the study period. For usual residents of England, the dataset also contains comorbidity flags derived from linked Hospital Episode Statistics data from April 2017 to December 2019 and GP Extraction Service Data from 2015-2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To estimate county of residence of Filipinx healthcare workers who died of COVID-19, we retrieved data from the Kanlungan website during the month of December 2020.22 In deciding who to include on the website, the AF3IRM team that established the Kanlungan website set two standards in data collection. First, the team found at least one source explicitly stating that the fallen healthcare worker was of Philippine ancestry; this was mostly media articles or obituaries sharing the life stories of the deceased. In a few cases, the confirmation came directly from the deceased healthcare worker's family member who submitted a tribute. Second, the team required a minimum of two sources to identify and announce fallen healthcare workers. We retrieved 86 US tributes from Kanlungan, but only 81 of them had information on county of residence. In total, 45 US counties with at least one reported tribute to a Filipinx healthcare worker who died of COVID-19 were identified for analysis and will hereafter be referred to as “Kanlungan counties.” Mortality data by county, race, and ethnicity came from the National Center for Health Statistics (NCHS).24 Updated weekly, this dataset is based on vital statistics data for use in conducting public health surveillance in near real time to provide provisional mortality estimates based on data received and processed by a specified cutoff date, before data are finalized and publicly released.25 We used the data released on December 30, 2020, which included provisional COVID-19 death counts from February 1, 2020 to December 26, 2020—during the height of the pandemic and prior to COVID-19 vaccines being available—for counties with at least 100 total COVID-19 deaths. During this time period, 501 counties (15.9% of the total 3,142 counties in all 50 states and Washington DC)26 met this criterion. Data on COVID-19 deaths were available for six major racial/ethnic groups: Non-Hispanic White, Non-Hispanic Black, Non-Hispanic Native Hawaiian or Other Pacific Islander, Non-Hispanic American Indian or Alaska Native, Non-Hispanic Asian (hereafter referred to as Asian American), and Hispanic. People with more than one race, and those with unknown race were included in the “Other” category. NCHS suppressed county-level data by race and ethnicity if death counts are less than 10. In total, 133 US counties reported COVID-19 mortality data for Asian Americans. These data were used to calculate the percentage of all COVID-19 decedents in the county who were Asian American. We used data from the 2018 American Community Survey (ACS) five-year estimates, downloaded from the Integrated Public Use Microdata Series (IPUMS) to create county-level population demographic variables.27 IPUMS is publicly available, and the database integrates samples using ACS data from 2000 to the present using a high degree of precision.27 We applied survey weights to calculate the following variables at the county-level: median age among Asian Americans, average income to poverty ratio among Asian Americans, the percentage of the county population that is Filipinx, and the percentage of healthcare workers in the county who are Filipinx. Healthcare workers encompassed all healthcare practitioners, technical occupations, and healthcare service occupations, including nurse practitioners, physicians, surgeons, dentists, physical therapists, home health aides, personal care aides, and other medical technicians and healthcare support workers. County-level data were available for 107 out of the 133 counties (80.5%) that had NCHS data on the distribution of COVID-19 deaths among Asian Americans, and 96 counties (72.2%) with Asian American healthcare workforce data. The ACS 2018 five-year estimates were also the source of county-level percentage of the Asian American population (alone or in combination) who are Filipinx.8 In addition, the ACS provided county-level population counts26 to calculate population density (people per 1,000 people per square mile), estimated by dividing the total population by the county area, then dividing by 1,000 people. The county area was calculated in ArcGIS 10.7.1 using the county boundary shapefile and projected to Albers equal area conic (for counties in the US contiguous states), Hawai’i Albers Equal Area Conic (for Hawai’i counties), and Alaska Albers Equal Area Conic (for Alaska counties).20
Late in December 2019, the World Health Organisation (WHO) China Country Office obtained information about severe pneumonia of an unknown cause, detected in the city of Wuhan in Hubei province, China. This later turned out to be the novel coronavirus disease (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) of the coronavirus family. The disease causes respiratory illness characterized by primary symptoms like cough, fever, and in more acute cases, difficulty in breathing. WHO later declared COVID-19 as a Pandemic because of its fast rate of spread across the Globe.
The COVID-19 datasets organized by continent contain daily level information about the COVID-19 cases in the different continents of the world. It is a time-series data and the number of cases on any given day is cumulative. The original datasets can be found on this John Hopkins University Github repository. I will be updating the COVID-19 datasets on a regular basis with every update from John Hopkins University. I have also included the World COVID-19 tests data scraped from Worldometer and 2020 world population also scraped from worldometer.
COVID-19 cases
covid19_world.csv
. It contains the cumulative number of COVID-19 cases from around the world since January 22, 2020, as compiled by John Hopkins University.
covid19_asia.csv
, covid19_africa.csv
, covid19_europe.csv
, covid19_northamerica.csv
, covid19.southamerica.csv
, covid19_oceania.csv
, and covid19_others.csv
. These contain the cumulative number of COVID-19 cases organized by the continent.
Field description - ObservationDate: Date of observation in YY/MM/DD - Country_Region: name of Country or Region - Province_State: name of Province or State - Confirmed: the number of COVID-19 confirmed cases - Deaths: the number of deaths from COVID-19 - Recovered: the number of recovered cases - Active: the number of people still infected with COVID-19 Note: Active = Confirmed - (Deaths + Recovered)
COVID-19 tests
covid19_tests.csv
. It contains the cumulative number of COVID tests data from worldometer conducted since the onset of the pandemic. Data available from June 01, 2020.
Field description Date: date in YY/MM/DD Country, Other: Country, Region, or dependency TotalTests: cumulative number of tests up till that date Population: population of Country, Region, or dependency Tests/1M pop: tests per 1 million of the population 1 Testevery X ppl: 1 test for every X number of people
2020 world population
world_population(2020).csv
. It contains the 2020 world population as reported by woldometer.
Field description Country (or dependency): Country or dependency Population (2020): population in 2020 Yearly Change: yearly change in population as a percentage Net Change: the net change in population Density(P/km2): population density Land Area(km2): land area Migrants(net): net number of migrants Fert. Rate: Fertility Rate Med. Age: median age Urban pop: urban population World Share: share of the world population as a percentage
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COVID-19 pandemic, which began in China in late 2019, and subsequently spread across the world during the first several months of 2020, has had a dramatic impact on all facets of life. At the same time, it has not manifested in the same way in every nation. Some countries experienced a large initial spike in cases and deaths, followed by a rapid decline, whereas others had relatively low rates of both outcomes throughout the first half of 2020. The United States experienced a unique pattern of the virus, with a large initial spike, followed by a moderate decline in cases, followed by second and then third spikes. In addition, research has shown that in the United States the severity of the pandemic has been associated with poverty and access to health care services. This study was designed to examine whether the course of the pandemic has been uniform across America, and if not how it differed, particularly with respect to poverty. Results of a random intercept multilevel mixture model revealed that the pandemic followed four distinct paths in the country. The least ethnically diverse (85.1% white population) and most rural (82.8% rural residents) counties had the lowest death rates (0.06/1000) and the weakest link between deaths due to COVID-19 and poverty (b = 0.03). In contrast, counties with the highest proportion of urban residents (100%), greatest ethnic diversity (48.2% nonwhite), and highest population density (751.4 people per square mile) had the highest COVID-19 death rates (0.33/1000), and strongest relationship between the COVID-19 death rate and poverty (b = 46.21). Given these findings, American policy makers need to consider developing responses to future pandemics that account for local characteristics. These responses must take special account of pandemic responses among people of color, who suffered the highest death rates in the nation.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset represents preliminary estimates of cumulative U.S. COVID-19 disease burden for the 2024-2025 period, including illnesses, outpatient visits, hospitalizations, and deaths. The weekly COVID-19-associated burden estimates are preliminary and based on continuously collected surveillance data from patients hospitalized with laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. The data come from the Coronavirus Disease 2019 (COVID-19)-Associated Hospitalization Surveillance Network (COVID-NET), a surveillance platform that captures data from hospitals that serve about 10% of the U.S. population. Each week CDC estimates a range (i.e., lower estimate and an upper estimate) of COVID-19 -associated burden that have occurred since October 1, 2024.
Note: Data are preliminary and subject to change as more data become available. Rates for recent COVID-19-associated hospital admissions are subject to reporting delays; as new data are received each week, previous rates are updated accordingly.
References
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (to