Facebook
TwitterRead the associated blogpost for a detailed description of how this dataset was prepared; plus extra code for producing animated maps.
The 2019 Novel Coronavirus (COVID-19) continues to spread in countries around the world. This dataset provides daily updated number of reported cases & deaths in Germany on the federal state (Bundesland) and county (Landkreis/Stadtkreis) level. In April 2021 I added a dataset on vaccination progress. In addition, I provide geospatial shape files and general state-level population demographics to aid the analysis.
The dataset consists of thre main csv files: covid_de.csv, demgraphics_de.csv, and covid_de_vaccines.csv. The geospatial shapes are included in the de_state.* files. See the column descriptions below for more detailed information.
covid_de.csv: COVID-19 cases and deaths which will be updated daily. The original data are being collected by Germany's Robert Koch Institute and can be download through the National Platform for Geographic Data (the latter site also hosts an interactive dashboard). I reshaped and translated the data (using R tidyverse tools) to make it better accessible. This blogpost explains how I prepared the data, and describes how to produces animated maps.
demographics_de.csv: General Demographic Data about Germany on the federal state level. Those have been downloaded from Germany's Federal Office for Statistics (Statistisches Bundesamt) through their Open Data platform GENESIS. The data reflect the (most recent available) estimates on 2018-12-31. You can find the corresponding table here.
covid_de_vaccines.csv: In April 2021 I added this file that contains the Covid-19 vaccination progress for Germany as a whole. It details daily doses, broken down cumulatively by manufacturer, as well as the cumulative number of people having received their first and full vaccination. The earliest data are from 2020-12-27.
de_state.*: Geospatial shape files for Germany's 16 federal states. Downloaded via Germany's Federal Agency for Cartography and Geodesy . Specifically, the shape file was obtained from this link.
COVID-19 dataset covid_de.csv:
state: Name of the German federal state. Germany has 16 federal states. I removed converted special characters from the original data.
county: The name of the German Landkreis (LK) or Stadtkreis (SK), which correspond roughly to US counties.
age_group: The COVID-19 data is being reported for 6 age groups: 0-4, 5-14, 15-34, 35-59, 60-79, and above 80 years old. As a shortcut the last category I'm using "80-99", but there might well be persons above 99 years old in this dataset. This column has a few NA entries.
gender: Reported as male (M) or female (F). This column has a few NA entries.
date: The calendar date of when a case or death were reported. There might be delays that will be corrected by retroactively assigning cases to earlier dates.
cases: COVID-19 cases that have been confirmed through laboratory work. This and the following 2 columns are counts per day, not cumulative counts.
deaths: COVID-19 related deaths.
recovered: Recovered cases.
Demographic dataset demographics_de.csv:
state, gender, age_group: same as above. The demographic data is available in higher age resolution, but I have binned it here to match the corresponding age groups in the covid_de.csv file.
population: Population counts for the respective categories. These numbers reflect the (most recent available) estimates on 2018-12-31.
Vaccination progress dataset covid_de_vaccines.csv:
date: calendar date of vaccination
doses, doses_first, doses_second: Daily count of administered doses: total, 1st shot, 2nd shot.
pfizer_cumul, moderna_cumul, astrazeneca_cumul: Daily cumulative number of administered vaccinations by manufacturer.
persons_first_cumul, persons_full_cumul: Daily cumulative number of people having received their 1st shot and full vaccination, respectively.
All the data have been extracted from open data sources which are being gratefully acknowledged:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.
So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.
Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.
Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC
This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.
The data is available from 22 Jan, 2020.
Here’s a polished version suitable for a professional Kaggle dataset description:
This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.
This is the primary dataset and contains aggregated COVID-19 statistics by location and date.
This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.
This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.
Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.
✅ Use covid_19_data.csv for up-to-date aggregated global trends.
✅ Use the line list datasets for detailed, individual-level case analysis.
If you are interested in knowing country level data, please refer to the following Kaggle datasets:
India - https://www.kaggle.com/sudalairajkumar/covid19-in-india
South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset
Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy
Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil
USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa
Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland
Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases
Johns Hopkins University for making the data available for educational and academic research purposes
MoBS lab - https://www.mobs-lab.org/2019ncov.html
World Health Organization (WHO): https://www.who.int/
DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.
BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/
National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google....
Facebook
Twitterhttps://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States recorded 16306656 Coronavirus Recovered since the epidemic began, according to the World Health Organization (WHO). In addition, United States reported 797346 Coronavirus Deaths. This dataset includes a chart with historical data for the United States Coronavirus Recovered.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
I combined several data sources to gain an integrated dataset involving country-level COVID-19 confirmed, recovered and fatalities cases which can be used to build some epidemic models such as SIR, SIR with mortality. Adding information regarding population which can be used for calculating incidence rate and prevalence rate. One of my applications based on this dataset is published at https://dylansp.shinyapps.io/COVID19_Visualization_Analysis_Tool/.
My approach is to retrieve cumulative confirmed cases, fatalities and recovered cases since 2020-01-22 onwards from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dataset, merged with country code as well as population of each country. For the purpose of building epidemic models, I calculated information regarding daily new confirmed cases, recovered cases, and fatalities, together with remaining confirmed cases which equal to cumulative confirmed cases - cumulative recovered cases - cumulative fatalities. I haven't yet to find creditable data sources regarding probable cases of various countries yet. I'll add them once I found them.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Covid-19 Data collected from various sources on the internet. This dataset has daily level information on the number of affected cases, deaths, and recovery from the 2019 novel coronavirus. Please note that this is time-series data and so the number of cases on any given day is the cumulative number.
The dataset includes 28 files scrapped from various data sources mainly the John Hopkins GitHub repository, the ministry of health affairs India, worldometer, and Our World in Data website. The details of the files are as follows
countries-aggregated.csv
A simple and cleaned data with 5 columns with self-explanatory names.
-covid-19-daily-tests-vs-daily-new-confirmed-cases-per-million.csv
A time-series data of daily test conducted v/s daily new confirmed case per million. Entity column represents Country name while code represents ISO code of the country.
-covid-contact-tracing.csv
Data depicting government policies adopted in case of contact tracing. 0 -> No tracing, 1-> limited tracing, 2-> Comprehensive tracing.
-covid-stringency-index.csv
The nine metrics used to calculate the Stringency Index are school closures; workplace closures; cancellation of public events; restrictions on public gatherings; closures of public transport; stay-at-home requirements; public information campaigns; restrictions on internal movements; and international travel controls. The index on any given day is calculated as the mean score of the nine metrics, each taking a value between 0 and 100. A higher score indicates a stricter response (i.e. 100 = strictest response).
-covid-vaccination-doses-per-capita.csv
A total number of vaccination doses administered per 100 people in the total population. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).
-covid-vaccine-willingness-and-people-vaccinated-by-country.csv
Survey who have not received a COVID vaccine and who are willing vs. unwilling vs. uncertain if they would get a vaccine this week if it was available to them.
-covid_india.csv
India specific data containing the total number of active cases, recovered and deaths statewide.
-cumulative-deaths-and-cases-covid-19.csv
A cumulative data containing death and daily confirmed cases in the world.
-current-covid-patients-hospital.csv
Time series data containing a count of covid patients hospitalized in a country
-daily-tests-per-thousand-people-smoothed-7-day.csv
Daily test conducted per 1000 people in a running week average.
-face-covering-policies-covid.csv
Countries are grouped into five categories:
1->No policy
2->Recommended
3->Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible
4->Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible
5->Required outside the home at all times regardless of location or presence of other people
-full-list-cumulative-total-tests-per-thousand-map.csv
Full list of total tests conducted per 1000 people.
-income-support-covid.csv
Income support captures if the government is covering the salaries or providing direct cash payments, universal basic income, or similar, of people who lose their jobs or cannot work. 0->No income support, 1->covers less than 50% of lost salary, 2-> covers more than 50% of the lost salary.
-internal-movement-covid.csv
Showing government policies in restricting internal movements. Ranges from 0 to 2 where 2 represents the strictest.
-international-travel-covid.csv
Showing government policies in restricting international movements. Ranges from 0 to 2 where 2 represents the strictest.
-people-fully-vaccinated-covid.csv
Contains the count of fully vaccinated people in different countries.
-people-vaccinated-covid.csv
Contains the total count of vaccinated people in different countries.
-positive-rate-daily-smoothed.csv
Contains the positivity rate of various countries in a week running average.
-public-gathering-rules-covid.csv
Restrictions are given based on the size of public gatherings as follows:
0->No restrictions
1 ->Restrictions on very large gatherings (the limit is above 1000 people)
2 -> gatherings between 100-1000 people
3 -> gatherings between 10-100 people
4 -> gatherings of less than 10 people
-school-closures-covid.csv
School closure during Covid.
-share-people-fully-vaccinated-covid.csv
Share of people that are fully vaccinated.
-stay-at-home-covid.csv
Countries are grouped into four categories:
0->No measures
1->Recommended not to leave the house
2->Required to not leave the house with exceptions for daily exercise, grocery shopping, and ‘essent...
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
June 8, 2023: Daily transmission is no longer available.
Summary of COVID-19 statistics for Connecticut correctional facilities including:
Total # of Staff Positive for COVID-19 Total # of Inmates Pos. for COVID-19 COVID-19 Pos. Inmates Housed at Northern CI Medical Isolation Unit COVID-19 Pos. Inmates Housed at MacDougall-Walker Medical Isolation Unit COVID-19 Pos. Staff Returned to Work Total # of Inmates Medically Cleared Total # of COVID-19 Pos. Inmate Deaths
More information can be found on the DOC website: https://portal.ct.gov/DOC/Common-Elements/Common-Elements/Health-Information-and-Advisories
Data will be updated every weekday.
Additional notes: The data on 7/15 reflects a decrease in the number of inmates testing positive for COVID-19 and those who have recovered; this decrease was due to an internal data audit that led to the removal of some duplicate information.
The data on 6/2/2020 reflects an increase in the number of inmates who had been medically cleared; this increase was the result of 146 asymptomatic positive inmates who had completed a 14-day isolation period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is created as a part of covid-19 global forecasting challenge. It contains parameters for the SIR model for different locations worldwide. But the main value of the dataset is estimated transmission period (average period between single infected individual infects next susceptible in pure susceptible population) per week per location.
The model is defined as ODE system as follows:
https://wikimedia.org/api/rest_v1/media/math/render/svg/29728a7d4bebe8197dca7d873d81b9dce954522e" alt="SIR ODE equations">
In order to reflect the transmission rate changes caused by spread constraining measures (social distancing, etc.) the Beta parameter is modelled separately as spline model (spline node estimate for every week). See paramsWeekly.csv which holds the Beta parameter values for every week as well as estimated R0 values (derived from Beta and Gamma paramters) for every week.
The models are fitted on John Hopkins University data (time series) using several runs of Nelder-Mead simplex optimization method (best run is taken) starting at different initial locations and RMSE as a loss.
What parameters are fitted (estimated) per country/province: * the day when the infection emerged in the country * the initial infected count on the first day of the infection * beta (separate value for every week) - an average number of contacts (sufficient to spread the disease) per day each infected individual has * gamma - fixed fraction of the infected group that will recover during any given day * R0 - Equals beta/gamma
How to read the figures. * points are real observed data provided by Johns Hopkins University * curves are model prediction
The dataset contains 3 data portions:
Always do visual check of the model fit (Figures directory) for quality control before start to use the corresponding parameter values in your analysis, as the dataset is obtained by automatic fitting procedure without manual quality control.
Thanks a lot Kaggle for organizing data sharing and challenges that make the world better.
Also many thanks to John Hopkins University for their hard work of gathering COVID-19 statistics worldwide.
You can try to find correlation between model parameters (e.g. gamma - patient recovery rate) and other properties of the modelled locations worldwide (e.g. weather, population density, level of medical care, etc.)
Facebook
TwitterAn application where people can share how they recovered from COVID-19.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
After October 13, 2022, this dataset will no longer be updated as the related CDC COVID Data Tracker site was retired on October 13, 2022.
This dataset contains historical trends in vaccinations and cases by age group, at the US national level. Data is stratified by at least one dose and fully vaccinated. Data also represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.
Facebook
TwitterIn 2020, global gross domestic product declined by 6.7 percent as a result of the coronavirus (COVID-19) pandemic outbreak. In Latin America, overall GDP loss amounted to 8.5 percent.
Facebook
TwitterIndia reported over 44 million confirmed cases of the coronavirus (COVID-19) as of October 20, 2023. The number of people infected with the virus was declining across the south Asian country.
What is the coronavirus?
COVID-19 is part of a large family of coronaviruses (CoV) that are transmitted from animals to people. The name COVID-19 is derived from the words corona, virus, and disease, while the number 19 represents the year that it emerged. Symptoms of COVID-19 resemble that of the common cold, with fever, coughing, and shortness of breath. However, serious infections can lead to pneumonia, multi-organ failure, severe acute respiratory syndrome, and even death, if appropriate medical help is not provided.
COVID-19 in India
India reported its first case of this coronavirus in late January 2020 in the southern state of Kerala. That led to a nation-wide lockdown between March and June that year to curb numbers from rising. After marginal success, the economy opened up leading to some recovery for the rest of 2020. In March 2021, however, the second wave hit the country causing record-breaking numbers of infections and deaths, crushing the healthcare system. The central government has been criticized for not taking action this time around, with "#ResignModi" trending on social media platforms in late April. The government's response was to block this line of content on the basis of fighting misinformation and reducing panic across the country.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised mortality rates for deaths involving coronavirus (COVID-19), non-COVID-19 deaths and all deaths by vaccination status, broken down by age group.
Facebook
TwitterThe World Bank in collaboration with the Kenya National Bureau of Statistics and the University of California, Berkeley are conducting the Kenya COVID-19 Rapid Response Phone Survey to track the socioeconomic impacts of the COVID-19 pandemic, the recovery from it as well as other shocks to provide timely data to inform policy. This dataset contains information from eight waves of the COVID-19 RRPS, which is part of a panel survey that targets Kenyan nationals and started in May 2020. The same households were interviewed every two months for five survey rounds, in the first year of data collection and every four months thereafter, with interviews conducted using Computer Assisted Telephone Interviewing (CATI) techniques.
The data set contains information from two samples of Kenyan households. The first sample is a randomly drawn subset of all households that were part of the 2015/16 Kenya Integrated Household Budget Survey (KIHBS) Computer-Assisted Personal Interviewing (CAPI) pilot and provided a phone number. The second was obtained through the Random Digit Dialing method, by which active phone numbers created from the 2020 Numbering Frame produced by the Kenya Communications Authority are randomly selected. The samples cover urban and rural areas and are designed to be representative of the population of Kenya using cell phones. Waves 1-7 of this survey include information on household background, service access, employment, food security, income loss, transfers, health, and COVID-19 knowledge and vaccinations. Wave 8 focused on how households were exposed to shocks, in particular adverse weather shocks and the increase in the price of food and fuel, but also included parts of the previous modules on household background, service access, employment, food security, income loss, and subjective wellbeing.
The data is uploaded in three files. The first is the hh file, which contains household level information. The ‘hhid’, uniquely identifies all household. The second is the adult level file, which contains data at the level of adult household members. Each adult in a household is uniquely identified by the ‘adult_id’. The third file is the child level file, available only for waves 3-7, which contains information for every child in the household. Each child in a household is uniquely identified by the ‘child_id’.
The duration of data collection and sample size for each completed wave was: Wave 1: May 14 to July 7, 2020; 4,061 Kenyan households Wave 2: July 16 to September 18, 2020; 4,492 Kenyan households Wave 3: September 28 to December 2, 2020; 4,979 Kenyan households Wave 4: January 15 to March 25, 2021; 4,892 Kenyan households Wave 5: March 29 to June 13, 2021; 5,854 Kenyan households Wave 6: July 14 to November 3, 2021; 5,765 Kenyan households Wave 7: November 15, 2021, to March 31, 2022; 5,633 Kenyan households Wave 8: May 31 to July 8, 2022: 4,550 Kenyan households
The same questionnaire is also administered to refugees in Kenya, with the data available in the UNHCR microdata library: https://microdata.unhcr.org/index.php/catalog/296/
National coverage covering rural and urban areas
Household, Individual
The COVID-19 RRPS with Kenyan households has two samples. The first sample consists of households that were part of the 2015/16 KIHBS CAPI pilot and provided a phone number. The 2015/16 KIHBS CAPI pilot is representative at the national level stratified by county and place of residence (urban and rural areas). At least one valid phone number was obtained for 9,007 households and all of them were included in the COVID-19 RRPS sample. The target respondent was the primary male or female household member from the 2015/16 KIHBS CAPI pilot. The second sample consists of households selected using the Random Digit Dialing method. A list of random mobile phone numbers was created using a random number generator from the 2020 Numbering Frame produced by the Kenya Communications Authority. The initial sampling frame therefore consisted of 92,999,970 randomly ordered phone numbers assigned to three networks: Safaricom, Airtel and Telkom. An introductory text message was sent to 5,000 randomly selected numbers to determine if numbers were in operation. Out of these, 4,075 were found to be active and formed the final sampling frame. There was no stratification and individuals that were called were asked about the households they live in. Until wave 7 sampled households that were not reached in earlier waves were also contacted along with households that were interviewed before. In wave 8 only households that had previously participated in the survey were contacted for interview. The “wave” variable represents in which wave the households were interviewed in.
Computer Assisted Personal Interview [capi]
The questionnaire was administered in English and is provided as a resource in pdf format. Additionally, questionnaires for each wave are also provided in Excel format coded for SCTO. The same questionnaire is also administered to refugees in Kenya, with the data available in the UNHCR microdata library: https://microdata.unhcr.org/index.php/catalog/296/
Facebook
TwitterStatistics (percentage with, median, mean, aggregates) on COVID-19 benefits (emergency and recovery benefits, enhancements to existing federal programs, provincial and territorial benefits) by 2019 total income decile group, age and gender for Canada, provinces and territories, census divisions and census subdivisions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Co-existing disease of COVID-19 recovered participants.
Facebook
TwitterVarious population statistics, including structured demographics data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Physical health status of COVID-19 recovered participants.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percent distribution of study participants by their knowledge of some disinfection measures post-COVID-19 infection (n = 417).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly excess mortality indicator is based on the exceptional data collection on weekly deaths that Eurostat and the National Statistical Institutes set up, in April 2020, in order to support the policy and research efforts related to the COVID-19 pandemic. With that data collection, Eurostat's target was to provide quickly statistics assessing the changing situation of the total number of deaths on a weekly basis, from early 2020 onwards.
The National Statistical Institutes transmit available data on total weekly deaths, classified by sex, 5-year age groups and NUTS3 regions (NUTS2021) over the last 20 years, on a voluntary basis. The resulting online tables, and complementary metadata, are available in the folder Weekly deaths - special data collection (demomwk).
Starting in 2025, the weekly deaths data collected on a quarterly basis. The database updated on the 16th of June 2025 (1st quarter), on the 16 th of September 2025 (2nd quarter), and next update will be in mid-December 2025 (3rd quarter), and mid-February 2026 (4th quarter).
In December 2020, Eurostat released the European Recovery Statistical Dashboard containing also indicators tracking economic and social developments, including health. In this context, “excess mortality” offers elements for monitoring and further analysing direct and indirect effects of the COVID-19 pandemic.
The monthly excess mortality indicator draws attention to the magnitude of the crisis by providing a comprehensive comparison of additional deaths amongst the European countries and allowing for further analysis of its causes. The number of deaths from all causes is compared with the expected number of deaths during a certain period in the past (baseline period, 2016-2019).
The reasons that excess mortality may vary according to different phenomena are that the indicator is comparing the total number of deaths from all causes with the expected number of deaths during a certain period in the past (baseline). While a substantial increase largely coincides with a COVID-19 outbreak in each country, the indicator does not make a distinction between causes of death. Similarly, it does not take into account changes over time and differences between countries in terms of the size and age/sex structure of the population Statistics on excess deaths provide information about the burden of mortality potentially related to the COVID-19 pandemic, thereby covering not only deaths that are directly attributed to the virus but also those indirectly related to or even due to another reason. For example, In July 2022, several countries recorded unusually high numbers of excess deaths compared to the same month of 2020 and 2021, a situation probably connected not only to COVID-19 but also to the heatwaves that affected parts of Europe during the reference period.
In addition to confirmed deaths, excess mortality captures COVID-19 deaths that were not correctly diagnosed and reported, as well as deaths from other causes that may be attributed to the overall crisis. It also accounts for the partial absence of deaths from other causes like accidents that did not occur due, for example, to the limitations in commuting or travel during the lockdown periods.
Facebook
TwitterRead the associated blogpost for a detailed description of how this dataset was prepared; plus extra code for producing animated maps.
The 2019 Novel Coronavirus (COVID-19) continues to spread in countries around the world. This dataset provides daily updated number of reported cases & deaths in Germany on the federal state (Bundesland) and county (Landkreis/Stadtkreis) level. In April 2021 I added a dataset on vaccination progress. In addition, I provide geospatial shape files and general state-level population demographics to aid the analysis.
The dataset consists of thre main csv files: covid_de.csv, demgraphics_de.csv, and covid_de_vaccines.csv. The geospatial shapes are included in the de_state.* files. See the column descriptions below for more detailed information.
covid_de.csv: COVID-19 cases and deaths which will be updated daily. The original data are being collected by Germany's Robert Koch Institute and can be download through the National Platform for Geographic Data (the latter site also hosts an interactive dashboard). I reshaped and translated the data (using R tidyverse tools) to make it better accessible. This blogpost explains how I prepared the data, and describes how to produces animated maps.
demographics_de.csv: General Demographic Data about Germany on the federal state level. Those have been downloaded from Germany's Federal Office for Statistics (Statistisches Bundesamt) through their Open Data platform GENESIS. The data reflect the (most recent available) estimates on 2018-12-31. You can find the corresponding table here.
covid_de_vaccines.csv: In April 2021 I added this file that contains the Covid-19 vaccination progress for Germany as a whole. It details daily doses, broken down cumulatively by manufacturer, as well as the cumulative number of people having received their first and full vaccination. The earliest data are from 2020-12-27.
de_state.*: Geospatial shape files for Germany's 16 federal states. Downloaded via Germany's Federal Agency for Cartography and Geodesy . Specifically, the shape file was obtained from this link.
COVID-19 dataset covid_de.csv:
state: Name of the German federal state. Germany has 16 federal states. I removed converted special characters from the original data.
county: The name of the German Landkreis (LK) or Stadtkreis (SK), which correspond roughly to US counties.
age_group: The COVID-19 data is being reported for 6 age groups: 0-4, 5-14, 15-34, 35-59, 60-79, and above 80 years old. As a shortcut the last category I'm using "80-99", but there might well be persons above 99 years old in this dataset. This column has a few NA entries.
gender: Reported as male (M) or female (F). This column has a few NA entries.
date: The calendar date of when a case or death were reported. There might be delays that will be corrected by retroactively assigning cases to earlier dates.
cases: COVID-19 cases that have been confirmed through laboratory work. This and the following 2 columns are counts per day, not cumulative counts.
deaths: COVID-19 related deaths.
recovered: Recovered cases.
Demographic dataset demographics_de.csv:
state, gender, age_group: same as above. The demographic data is available in higher age resolution, but I have binned it here to match the corresponding age groups in the covid_de.csv file.
population: Population counts for the respective categories. These numbers reflect the (most recent available) estimates on 2018-12-31.
Vaccination progress dataset covid_de_vaccines.csv:
date: calendar date of vaccination
doses, doses_first, doses_second: Daily count of administered doses: total, 1st shot, 2nd shot.
pfizer_cumul, moderna_cumul, astrazeneca_cumul: Daily cumulative number of administered vaccinations by manufacturer.
persons_first_cumul, persons_full_cumul: Daily cumulative number of people having received their 1st shot and full vaccination, respectively.
All the data have been extracted from open data sources which are being gratefully acknowledged: