Facebook
TwitterThis dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
56 million people died in 2017. What did they die from?
The Global Burden of Disease is a major global study on the causes of death and disease published in the medical journal The Lancet. These estimates of the annual number of deaths dataset are shown here.
Downloaded https://ourworldindata.org/causes-of-death dataset from first chart as CSV. Loaded the raw file in tableau prep for exploratory data distribution and applying some pivoting and cleaning. The output were uploaded in this dataset as well the original raw file.
Please notice the raw file have some country agrupations by region, but there is no data indicating it's an aggregation, so be careful analyzing the whole dataset guessing there are just countries as level of detail data. In order to be more accurate, I begin to analyze countries using the ISO Country code ("Code" named column). If you have no clue as me what country ZAF is, Google is your best friend (South Africa) 😉.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual data on death registrations by single year of age for the UK (1974 onwards) and England and Wales (1963 onwards).
Facebook
TwitterNote: Note: Starting October 10th, 2025 this dataset is deprecated and is no longer being updated. As of April 27, 2023 updates changed from daily to weekly. Summary The cumulative number of confirmed COVID-19 deaths among Maryland residents by age: 0-9; 10-19; 20-29; 30-39; 40-49; 50-59; 60-69; 70-79; 80+; Unknown. Description The MD COVID-19 - Confirmed Deaths by Age Distribution data layer is a collection of the statewide confirmed COVID-19 related deaths that have been reported each day by the Vital Statistics Administration by designated age ranges. A death is classified as confirmed if the person had a laboratory-confirmed positive COVID-19 test result. Some data on deaths may be unavailable due to the time lag between the death, typically reported by a hospital or other facility, and the submission of the complete death certificate. Probable deaths are available from the MD COVID-19 - Probable Deaths by Age Distribution data layer. Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
Facebook
TwitterAcross the world, people are living longer. In 1900, the average life expectancy of a newborn was 32 years. By 2021 this had more than doubled to 71 years. But where, when, how, and why has this dramatic change occurred? To understand it, we can look at data on life expectancy worldwide. The large reduction in child mortality has played an important role in increasing life expectancy. But life expectancy has increased at all ages. Infants, children, adults, and the elderly are all less likely to die than in the past, and death is being delayed. This remarkable shift results from advances in medicine, public health, and living standards. Along with it, many predictions of the ‘limit’ of life expectancy have been broken.
life_expectancy.csv| variable | class | description |
|---|---|---|
| Entity | character | Country or region entity |
| Code | character | Entity code |
| Year | double | Year |
| LifeExpectancy | double | Period life expectancy at birth - Sex: all - Age: 0 |
life_expectancy_different_ages.csv| variable | class | description |
|---|---|---|
| Entity | character | Country or region entity |
| Code | character | Entity code |
| Year | double | Year |
| LifeExpectancy0 | double | Period life expectancy at birth - Sex: all - Age: 0 |
| LifeExpectancy10 | double | Period life expectancy - Sex: all - Age: 10 |
| LifeExpectancy25 | double | Period life expectancy - Sex: all - Age: 25 |
| LifeExpectancy45 | double | Period life expectancy - Sex: all - Age: 45 |
| LifeExpectancy65 | double | Period life expectancy - Sex: all - Age: 65 |
| LifeExpectancy80 | double | Period life expectancy - Sex: all - Age: 80 |
life_expectancy_female_male.csv| variable | class | description |
|---|---|---|
| Entity | character | Country or region entity |
| Code | character | Entity code |
| Year | double | Year |
| LifeExpectancyDiffFM | double | Life expectancy difference (f-m) - Type: period - Sex: both - Age: 0 |
citation(tidytuesday)
Facebook
TwitterThis dataset presents the age-adjusted death rates for the 10 leading causes of death in the United States beginning in 1999. Data are based on information from all resident death certificates filed in the 50 states and the District of Columbia using demographic and medical characteristics. Age-adjusted death rates (per 100,000 population) are based on the 2000 U.S. standard population. Populations used for computing death rates after 2010 are postcensal estimates based on the 2010 census, estimated as of July 1, 2010. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for non-census years before 2010 are revised using updated intercensal population estimates and may differ from rates previously published. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD–10) are ranked according to the number of deaths assigned to rankable causes. Cause of death statistics are based on the underlying cause of death. SOURCES CDC/NCHS, National Vital Statistics System, mortality data (see http://www.cdc.gov/nchs/deaths.htm); and CDC WONDER (see http://wonder.cdc.gov). REFERENCES National Center for Health Statistics. Vital statistics data available. Mortality multiple cause files. Hyattsville, MD: National Center for Health Statistics. Available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm. Murphy SL, Xu JQ, Kochanek KD, Curtin SC, and Arias E. Deaths: Final data for 2015. National vital statistics reports; vol 66. no. 6. Hyattsville, MD: National Center for Health Statistics. 2017. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_06.pdf.
Facebook
TwitterNumber and percentage of deaths, by place of death (in hospital or non-hospital), 1991 to most recent year.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional deaths registration data for single year of age and average age of death (median and mean) of persons whose death involved coronavirus (COVID-19), England and Wales. Includes deaths due to COVID-19 and breakdowns by sex.
Facebook
TwitterThis mapping tool enables you to see how COVID-19 deaths in your area may relate to factors in the local population, which research has shown are associated with COVID-19 mortality. It maps COVID-19 deaths rates for small areas of London (known as MSOAs) and enables you to compare these to a number of other factors including the Index of Multiple Deprivation, the age and ethnicity of the local population, extent of pre-existing health conditions in the local population, and occupational data. Research has shown that the mortality risk from COVID-19 is higher for people of older age groups, for men, for people with pre-existing health conditions, and for people from BAME backgrounds. London boroughs had some of the highest mortality rates from COVID-19 based on data to April 17th 2020, based on data from the Office for National Statistics (ONS). Analysis from the ONS has also shown how mortality is also related to socio-economic issues such as occupations classified ‘at risk’ and area deprivation. There is much about COVID-19-related mortality that is still not fully understood, including the intersection between the different factors e.g. relationship between BAME groups and occupation. On their own, none of these individual factors correlate strongly with deaths for these small areas. This is most likely because the most relevant factors will vary from area to area. In some cases it may relate to the age of the population, in others it may relate to the prevalence of underlying health conditions, area deprivation or the proportion of the population working in ‘at risk occupations’, and in some cases a combination of these or none of them. Further descriptive analysis of the factors in this tool can be found here: https://data.london.gov.uk/dataset/covid-19--socio-economic-risk-factors-briefing
Facebook
TwitterNumber and percentage of deaths, by month and place of residence, 1991 to most recent year.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The Excess Winter Mortality Index (EWD Index) shows excess winter deaths as a Percentage Ratio of the number of deaths expected in the (eight) warmer months either side of Winter (01 December to 31 March). So the data’s yearly time period is from 01 August to 31 July the following year. In other words, EWD is the ratio of extra deaths from all causes during the winter months compared to average non-winter deaths. The EWD Index is partly dependent on the proportion of Older People in the population, as most excess winter deaths affect Older People. This indicator covers all ages, but there is no standardisation in its calculation by age or any other factor. So figures for an area can be influenced for example by the proportion of Older People. This dataset is updated annually. Source: Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF), indicator 90360 / E14. Age breakouts, confidence intervals and metadata are shown on the PHE (PHOF) site. Note: Please be advised that the ONS currently has this dataset under consultation for review (as of 09/01/2025) so may not be updated annually until the review has concluded. The full notice can be found on the ONS link for the Winter Mortality publication - please see link in the Additional Information Section.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2Fb55c8c53db1eb6809cc0fb6b5a081195%2F2024-05-25%20093352.png?generation=1716597253375211&alt=media" alt="">
These data were created with the assumption that the number of deaths due to obesity in 2014 will be estimated from data from 1990 to 2013.
There is also something called HINT data(hint.csv). This is data for 2015 and beyond. I have left it out of the train or test data because it has many missing values, but it may be useful for forecasting and for those who are interested in more recent data.
| Variables | Discription |
|---|---|
| Country | 205 country names |
| Code | Country code like AFG for Afghanistan |
| Year | Year of collecting data |
| Population | Population in a country |
| Percentage-Overweight | Percentage of defined as overweight, BMI >= 25(age-standardized estimate)(%),Sex: both sexes, Age group:18+ |
| Mean-Daily-Caloric-Supply | Mean of daily supply of calories among overweight or obesity, BMI >= 25(age-standardized). Only about men |
| Mean-BMI | BMI, Age group:18+ years. 2 columns for both male and female |
| Percentage-Overweighted-Male | Percentage of adults who are overweight (age-standardized) - Age group: 18+ years. 2 columns for both male and female |
| Prevalence-Hypertension-Male | Prevalence of hypertension among adults aged 30-79 years(age-standardized). 2 columns for both male and female |
| Prevalence-Obesity | Prevalence of obesity among adults, BMI >= 30(age-standardized estimate)(%),Sex: both sexes, Age group:18+ |
| Death-By-High-BMI | Deaths that are from all causes attributed to high body-mass index per 100,000 people, in both sexes aged age-standarized |
Facebook
TwitterTHIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the shadows of the Covid-19 pandemic, there is another global health crisis that has gone largely unnoticed. This is the Noncommunicable Disease (NCD) pandemic.
The WHO website describes NCDs as follows:
Noncommunicable diseases (NCDs), also known as chronic diseases, tend to be of long duration and are the result of a combination of genetic, physiological, environmental and behaviours factors.
The main types of NCDs are cardiovascular diseases (like heart attacks and stroke), cancers, chronic respiratory diseases (such as chronic obstructive pulmonary disease and asthma) and diabetes.
NCDs disproportionately affect people in low- and middle-income countries where more than three quarters of global NCD deaths – 32million – occur.
- Noncommunicable diseases (NCDs) kill 41 million people each year, equivalent to 71% of all deaths globally.
- Each year, 15 million people die from a NCD between the ages of 30 and 69 years; over 85% of these "premature" deaths occur in low- and middle-income > * countries.
- Cardiovascular diseases account for most NCD deaths, or 17.9 million people annually, followed by cancers (9.0 million), respiratory diseases (3.9million), and diabetes (1.6 million).
- These 4 groups of diseases account for over 80% of all premature NCD deaths.
- Tobacco use, physical inactivity, the harmful use of alcohol and unhealthy diets all increase the risk of dying from a NCD.
- Detection, screening and treatment of NCDs, as well as palliative care, are key components of the response to NCDs.
This data repository consists of 3 CSV files: WHO-cause-of-death-by-NCD.csv is the main dataset, which provides the percentage of deaths caused by NCDs out of all causes of death, for each nation globally. Metadata_Country.csv and Metadata_Indicator.csv provide additional metadata which is helpful for interpreting the main CSV.
The data collected spans a period from 2000 to 2016. The main CSV has columns for every year from 1960 to 2019. It is advisable to drop all redundant columns where no data was collected.
Furthermore, it is advisable to merge Metadata_Country.csv with the main CSV as it provides valuable additional information, particularly on the economic situation of each nation.
This dataset has been extracted from The World Bank 'Cause of death, by non-communicable diseases (% of total)' Dataset, derived based on the data from WHO's Global Health Estimates. It is freely provided under a Creative Commons Attribution 4.0 International License (CC BY 4.0), with the additional terms as stated on the World Bank website: World Bank Terms of Use for Datasets.
I would be interested to see some good data wrangling (dropping redundant columns), as well as kernels interpreting additional information in 'SpecialNotes' column in Metadata_country.csv
It would also be great to see what different factors influence NCDs: most of all, the geopolitical factors. Would be great to see some choropleth visualisations to get an idea of which regions are most affected by NCDs.
Facebook
TwitterThis dataset contains counts and rates (per 1,000,000 residents) of asthma deaths among Californians statewide and by county. The data are stratified by age group (all ages, 0-17, 18+) and reported for 3-year periods. The data are derived from the California Death Statistical Master Files, which contain information collected from death certificates. All deaths with asthma coded as the underlying cause of death (ICD-10 CM J45 or J46) are included.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The Excess Winter Mortality Index (EWD Index) shows excess winter deaths as a Percentage Ratio of the number of deaths expected in the (eight) warmer months either side of Winter (01 December to 31 March). So the data’s yearly time period is from 01 August to 31 July the following year. In other words, EWD is the ratio of extra deaths from all causes during the winter months compared to average non-winter deaths. The EWD Index is partly dependent on the proportion of Older People in the population, as most excess winter deaths affect Older People. This indicator covers all ages, but there is no standardisation in its calculation by age or any other factor. So figures for an area can be influenced for example by the proportion of Older People. This dataset is updated annually. Source: Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF), indicator 90360 / E14. Age breakouts, confidence intervals and metadata are shown on the PHE (PHOF) site. Note: Please be advised that the ONS currently has this dataset under consultation for review (as of 09/01/2025) so may not be updated annually until the review has concluded. The full notice can be found on the ONS link for the Winter Mortality publication - please see link in the Additional Information Section.
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Daily updates of Covid-19 Global Excess Deaths from the Economist's GitHub repository: https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model
Interpreting estimates
Estimating excess deaths for every country every day since the pandemic began is a complex and difficult task. Rather than being overly confident in a single number, limited data means that we can often only give a very very wide range of plausible values. Focusing on central estimates in such cases would be misleading: unless ranges are very narrow, the 95% range should be reported when possible. The ranges assume that the conditions for bootstrap confidence intervals are met. Please see our tracker page and methodology for more information.
New variants
The Omicron variant, first detected in southern Africa in November 2021, appears to have characteristics that are different to earlier versions of sars-cov-2. Where this variant is now dominant, this change makes estimates uncertain beyond the ranges indicated. Other new variants may do the same. As more data is incorporated from places where new variants are dominant, predictions improve.
Non-reporting countries
Turkmenistan and the Democratic People's Republic of Korea have not reported any covid-19 figures since the start of the pandemic. They also have not published all-cause mortality data. Exports of estimates for the Democratic People's Republic of Korea have been temporarily disabled as it now issues contradictory data: reporting a significant outbreak through its state media, but zero confirmed covid-19 cases/deaths to the WHO.
Acknowledgements
A special thanks to all our sources and to those who have made the data to create these estimates available. We list all our sources in our methodology. Within script 1, the source for each variable is also given as the data is loaded, with the exception of our sources for excess deaths data, which we detail in on our free-to-read excess deaths tracker as well as on GitHub. The gradient booster implementation used to fit the models is aGTBoost, detailed here.
Calculating excess deaths for the entire world over multiple years is both complex and imprecise. We welcome any suggestions on how to improve the model, be it data, algorithm, or logic. If you have one, please open an issue.
The Economist would also like to acknowledge the many people who have helped us refine the model so far, be it through discussions, facilitating data access, or offering coding assistance. A special thanks to Ariel Karlinsky, Philip Schellekens, Oliver Watson, Lukas Appelhans, Berent Å. S. Lunde, Gideon Wakefield, Johannes Hunger, Carol D'Souza, Yun Wei, Mehran Hosseini, Samantha Dolan, Mollie Van Gordon, Rahul Arora, Austin Teda Atmaja, Dirk Eddelbuettel and Tom Wenseleers.
All coding and data collection to construct these models (and make them update dynamically) was done by Sondre Ulvund Solstad. Should you have any questions about them after reading the methodology, please open an issue or contact him at sondresolstad@economist.com.
Suggested citation The Economist and Solstad, S. (corresponding author), 2021. The pandemic’s true death toll. [online] The Economist. Available at: https://www.economist.com/graphic-detail/coronavirus-excess-deaths-estimates [Accessed ---]. First published in the article "Counting the dead", The Economist, issue 20, 2021.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised mortality rates for deaths involving coronavirus (COVID-19), non-COVID-19 deaths and all deaths by vaccination status, broken down by age group.
Facebook
TwitterThis data presents provisional counts for drug overdose deaths based on a current flow of mortality data in the National Vital Statistics System. Counts for the most recent final annual data are provided for comparison. National provisional counts include deaths occurring within the 50 states and the District of Columbia as of the date specified and may not include all deaths that occurred during a given time period. Provisional counts are often incomplete and causes of death may be pending investigation resulting in an underestimate relative to final counts. To address this, methods were developed to adjust provisional counts for reporting delays by generating a set of predicted provisional counts. Several data quality metrics, including the percent completeness in overall death reporting, percentage of deaths with cause of death pending further investigation, and the percentage of drug overdose deaths with specific drugs or drug classes reported are included to aid in interpretation of provisional data as these measures are related to the accuracy of provisional counts. Reporting of the specific drugs and drug classes involved in drug overdose deaths varies by jurisdiction, and comparisons of death rates involving specific drugs across selected jurisdictions should not be made. Provisional data presented will be updated on a monthly basis as additional records are received. For more information please visit: https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm
Facebook
TwitterThis dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.