https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.
It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.
Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.
This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.
A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.
All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.
This dataset of U.S. mortality trends since 1900 highlights trends in age-adjusted death rates for five selected major causes of death. Age-adjusted death rates (deaths per 100,000) after 1998 are calculated based on the 2000 U.S. standard population. Populations used for computing death rates for 2011–2017 are postcensal estimates based on the 2010 census, estimated as of July 1, 2010. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for noncensus years between 2000 and 2010 are revised using updated intercensal population estimates and may differ from rates previously published. Data on age-adjusted death rates prior to 1999 are taken from historical data (see References below). Revisions to the International Classification of Diseases (ICD) over time may result in discontinuities in cause-of-death trends. SOURCES CDC/NCHS, National Vital Statistics System, historical data, 1900-1998 (see https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm); CDC/NCHS, National Vital Statistics System, mortality data (see http://www.cdc.gov/nchs/deaths.htm); and CDC WONDER (see http://wonder.cdc.gov). REFERENCES National Center for Health Statistics, Data Warehouse. Comparability of cause-of-death between ICD revisions. 2008. Available from: http://www.cdc.gov/nchs/nvss/mortality/comparability_icd.htm. National Center for Health Statistics. Vital statistics data available. Mortality multiple cause files. Hyattsville, MD: National Center for Health Statistics. Available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm. Kochanek KD, Murphy SL, Xu JQ, Arias E. Deaths: Final data for 2017. National Vital Statistics Reports; vol 68 no 9. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_09-508.pdf. Arias E, Xu JQ. United States life tables, 2017. National Vital Statistics Reports; vol 68 no 7. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_07-508.pdf. National Center for Health Statistics. Historical Data, 1900-1998. 2009. Available from: https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm.
Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.
This dataset presents the age-adjusted death rates for the 10 leading causes of death in the United States beginning in 1999.
Data are based on information from all resident death certificates filed in the 50 states and the District of Columbia using demographic and medical characteristics. Age-adjusted death rates (per 100,000 population) are based on the 2000 U.S. standard population. Populations used for computing death rates after 2010 are postcensal estimates based on the 2010 census, estimated as of July 1, 2010. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for non-census years before 2010 are revised using updated intercensal population estimates and may differ from rates previously published.
Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD–10) are ranked according to the number of deaths assigned to rankable causes. Cause of death statistics are based on the underlying cause of death.
SOURCES CDC/NCHS, National Vital Statistics System, mortality data (see http://www.cdc.gov/nchs/deaths.htm); and CDC WONDER (see http://wonder.cdc.gov).
REFERENCES
National Center for Health Statistics. Vital statistics data available. Mortality multiple cause files. Hyattsville, MD: National Center for Health Statistics. Available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm.
Murphy SL, Xu JQ, Kochanek KD, Curtin SC, and Arias E. Deaths: Final data for 2015. National vital statistics reports; vol 66. no. 6. Hyattsville, MD: National Center for Health Statistics. 2017. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_06.pdf.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
The Mortality - Multiple Cause of Death data on CDC WONDER are county-level national mortality and population data spanning the yehttps://healthdata.gov/d/2sz9-6c59ars 1999-2006. These data are available in two separate data sets: one data set for years 1999-2004 with 3 race groups, and another data set for years 2005-2006 with 4 race groups and 3 Hispanic origin categories. Data are based on death certificates for U.S. residents. Each death certificate contains a single underlying cause of death, up to twenty additional multiple causes, and demographic data. The number of deaths, crude death rates, age-adjusted death rates, standard errors and 95% confidence intervals for death rates can be obtained by place of residence (total U.S., state, and county), age group (including infants), race, Hispanic ethnicity (years 2005-2006 only), sex, year of death, and cause-of-death (4-digit ICD-10 code or group of codes). The data are produced by the National Center for Health Statistics.
Death statistics (i) Number of Deaths for Different Sexes and Crude Death Rate for the Period from 1981 to 2023 (ii) Age-standardised Death Rate (Overall and by Sex) for the Period from 1981 to 2023 (iii) Age-specific Death Rate for Year 2013 and 2023 (iv) Death Rates by Leading Causes of Death for the Period from 2001 to 2023 (v) Number of Deaths by Leading Causes of Death for the Period from 2001 to 2023 (vi) Age-standardised Death Rates by Leading Causes of Death for the Period from 2001 to 2023 (vii) Late Foetal Mortality Rate for the Period from 1981 to 2023 (viii) Perinatal Mortality Rate for the Period from 1981 to 2023 (ix) Neonatal Mortality Rate for the Period from 1981 to 2023 (x) Infant Mortality Rate for the Period from 1981 to 2023 (xi) Number of Maternal Deaths for the Period from 1981 to 2023 (xii) Maternal Mortality Ratio for the Period from 1981 to 2023
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data on causes of death (COD) provide information on mortality patterns and form a major element of public health information.
The COD data refer to the underlying cause which - according to the World Health Organisation (WHO) - is "the disease or injury which initiated the train of morbid events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury".
The data are derived from the medical certificate of death, which is obligatory in the Member States. The information recorded in the death certificate is according to the rules specified by the WHO.
Data published in Eurostat's dissemination database are broken down by sex, 5-year age groups, cause of death and by residency and country of occurrence. For stillbirths and neonatal deaths additional breakdowns might include age of mother and parity.
Data are available for Member States, Iceland, Norway, Liechtenstein, Switzerland, United Kingdom, Serbia, Turkey, North Macedonia and Albania. Regional data (NUTS level 2) are available for all of the countries having NUTS2 regions except Albania.
Annual national data are available in Eurostat's dissemination database in absolute number, crude death rates and standardised death rates. At regional level the same is provided in form of 3-years averages (the average of year, year -1 and year -2). Annual crude and standardised death rates are also available at NUTS2 level. Monthly national data are available for 21 EU Member States from reference year 2019 and in 24 Member States from reference year 2022 in absolute numbers and standardised death rates.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows premature deaths (Age under 75), numbers and rates by gender, as 3-year moving-averages. All-Cause Mortality rates are a summary indicator of population health status. All-cause mortality is related to Life Expectancy, and both may be influenced by health inequalities. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), Public Health Outcomes Framework (PHOF) indicator ID 108. This data is updated annually.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows premature deaths (Age under 75), numbers and rates by gender, as 3-year moving-averages. All-Cause Mortality rates are a summary indicator of population health status. All-cause mortality is related to Life Expectancy, and both may be influenced by health inequalities. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), Public Health Outcomes Framework (PHOF) indicator ID 108. This data is updated annually.
Number of deaths and age-specific mortality rates for selected grouped causes, by age group and sex, 2000 to most recent year.
The Detailed Mortality - Underlying Cause of Death data on CDC WONDER are county-level national mortality and population data spanning the years 1999-2009. Data are based on death certificates for U.S. residents. Each death certificate contains a single underlying cause of death, and demographic data. The number of deaths, crude death rates, age-adjusted death rates, standard errors and 95% confidence intervals for death rates can be obtained by place of residence (total U.S., region, state, and county), age group (including infants and single-year-of-age cohorts), race (4 groups), Hispanic ethnicity, sex, year of death, and cause-of-death (4-digit ICD-10 code or group of codes, injury intent and mechanism categories, or drug and alcohol related causes), year, month and week day of death, place of death and whether an autopsy was performed. The data are produced by the National Center for Health Statistics.
Rank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a database (parquet format) containing publicly available multiple cause mortality data from the US (CDC/NCHS) for 2014-2022. Not all variables are included on this export. Please see below for restrictions on the use of these data imposed by NCHS. You can use the arrow package in R to open the file. See here for example analysis; https://github.com/DanWeinberger/pneumococcal_mortality/blob/main/analysis_nongeo.Rmd . For instance, save this file in a folder called "parquet3":
library(arrow)
library(dplyr)
pneumo.deaths.in <- open_dataset("R:/parquet3", format = "parquet") %>% #open the dataset
filter(grepl("J13|A39|J181|A403|B953|G001", all_icd)) %>% #filter to records that have the selected ICD codes
collect() #call the dataset into memory. Note you should do any operations you canbefore calling 'collect()" due to memory issues
The variables included are named: (see full dictionary:https://www.cdc.gov/nchs/nvss/mortality_public_use_data.htm)
year: Calendar year of death
month: Calendar month of death
age_detail_number: number indicating year or part of year; can't be interpreted itself here. see agey variable instead
sex: M/F
place_of_death:
Place of Death and Decedent’s Status
Place of Death and Decedent’s Status
1 ... Hospital, Clinic or Medical Center
- Inpatient
2 ... Hospital, Clinic or Medical Center
- Outpatient or admitted to Emergency Room
3 ... Hospital, Clinic or Medical Center
- Dead on Arrival
4 ... Decedent’s home
5 ... Hospice facility
6 ... Nursing home/long term care
7 ... Other
9 ... Place of death unknown
all_icd: Cause of death coded as ICD10 codes. ICD1-ICD21 pasted into a single string, with separation of codes by an underscore
hisp_recode: 0=Non-Hispanic; 1=Hispanic; 999= Not specified
race_recode: race coding prior to 2018 (reconciled in race_recode_new)
race_recode_alt: race coding after 2018 (reconciled in race_recode_new)
race_recode_new:
1='White'
2= 'Black'
3='Hispanic'
4='American Indian'
5='Asian/Pacific Islanders'
agey:
age in years (or partial years for kids <12months)
https://www.cdc.gov/nchs/data_access/restrictions.htm
Please Read Carefully Before Using NCHS Public Use Survey Data
The National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), conducts statistical and epidemiological activities under the authority granted by the Public Health Service Act (42 U.S.C. § 242k). NCHS survey data are protected by Federal confidentiality laws including Section 308(d) Public Health Service Act [42 U.S.C. 242m(d)] and the Confidential Information Protection and Statistical Efficiency Act or CIPSEA [Pub. L. No. 115-435, 132 Stat. 5529 § 302]. These confidentiality laws state the data collected by NCHS may be used only for statistical reporting and analysis. Any effort to determine the identity of individuals and establishments violates the assurances of confidentiality provided by federal law.
Terms and Conditions
NCHS does all it can to assure that the identity of individuals and establishments cannot be disclosed. All direct identifiers, as well as any characteristics that might lead to identification, are omitted from the dataset. Any intentional identification or disclosure of an individual or establishment violates the assurances of confidentiality given to the providers of the information. Therefore, users will:
By using these data you signify your agreement to comply with the above-stated statutorily based requirements.
Sanctions for Violating NCHS Data Use Agreement
Willfully disclosing any information that could identify a person or establishment in any manner to a person or agency not entitled to receive it, shall be guilty of a class E felony and imprisoned for not more than 5 years, or fined not more than $250,000, or both.
TABLE III. Deaths in 122 U.S. cities – 2016. 122 Cities Mortality Reporting System — Each week, the vital statistics offices of 122 cities across the United States report the total number of death certificates processed and the number of those for which pneumonia or influenza was listed as the underlying or contributing cause of death by age group (Under 28 days, 28 days –1 year, 1-14 years, 15-24 years, 25-44 years, 45-64 years, 65-74 years, 75-84 years, and ≥ 85 years).
FOOTNOTE: U: Unavailable. —: No reported cases. * Mortality data in this table are voluntarily reported from 122 cities in the United States, most of which have populations of 100,000 or more. A death is reported by the place of its occurrence and by the week that the death certificate was filed. Fetal deaths are not included.
† Pneumonia and influenza.
§ Total includes unknown ages.
This cumulative dataset contains statistics on mortality and causes of death in South Africa covering the period 1997-2017. The mortality and causes of death dataset is part of a regular series published by Stats SA, based on data collected through the civil registration system. This dataset is the most recent cumulative round in the series which began with the separately available dataset Recorded Deaths 1996.
The main objective of this dataset is to outline emerging trends and differentials in mortality by selected socio-demographic and geographic characteristics for deaths that occurred in the registered year and over time. Reliable mortality statistics, are the cornerstone of national health information systems, and are necessary for population health assessment, health policy and service planning; and programme evaluation. They are essential for studying the occurrence and distribution of health-related events, their determinants and management of related health problems. These data are particularly critical for monitoring the Sustainable Development Goals (SDGs) and Agenda 2063 which share the same goal for a high standard of living and quality of life, sound health and well-being for all and at all ages. Mortality statistics are also required for assessing the impact of non-communicable diseases (NCD's), emerging infectious diseases, injuries and natural disasters.
National coverage
Individuals
This dataset is based on information on mortality and causes of death from the South African civil registration system. It covers all death notification forms from the Department of Home Affairs for deaths that occurred in 1997-2017, that reached Stats SA during the 2018/2019 processing phase.
Administrative records data [adm]
Other [oth]
The registration of deaths is captured using two instruments: form BI-1663 and form DHA-1663 (Notification/Register of death/stillbirth).
This cumulative dataset is part of a regular series published by Stats SA and includes all previous rounds in the series (excluding Recorded Deaths 1996). Stats SA only includes one variable to classify the occupation group of the deceased (OccupationGrp) in the current round (1997-2017). Prior to 2016, Stats SA included both occupation group (OccupationGrp) and industry classification (Industry) in all previous rounds. Therefore, DataFirst has made the 1997-2015 cumulative round available as a separately downloadable dataset which includes both occupation group and industry classification of the deceased spanning the years 1997-2015.
Note: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org.
This dataset contains the cumulative number of deaths, average number of deaths annually, average annual crude and adjusted death rates with corresponding 95% confidence intervals, and average annual years of potential life lost per 100,000 residents aged 75 and younger due to selected causes of death, by Chicago community area, for the years 2006 – 2010. A ranking for each measure is also provided, with the highest value indicated with a ranking of 1. See the full description at: https://data.cityofchicago.org/api/views/6vw3-8p6f/files/CqPqfHSv8UUAoXCBjn4_tLqcQHhb36Ih4-meM-4zNzs?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\MORTALITY\Dataset_Description_06_10_PORTAL_ONLY.pdf
This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated 8/14/2024.
Rate of deaths per 100,000 population by selected underlying causes of death among Maryland residents (1992-2017).
https://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/datahttps://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/data
Mortality Rates for Lake County, Illinois. Explanation of field attributes:
Average Age of Death – The average age at which a people in the given zip code die.
Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000.
Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000.
COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.
By Health [source]
This fascinating dataset takes a look at the leading causes of death in the United States from 1980-2009, broken down by sex, race, and Hispanic origin. This data sheds light on how mortality in the US has changed over time among these categories. Accounting for everything from heart disease to cancer to suicide, this insight can be used by health researchers and policy makers to gain a better understanding of disparities in healthcare and deaths across different groups. Whether studying questions related to public health or more targeted population issues such as gender biases in death rates, this dataset provides an important resource for anyone interested in examining mortality across demographic lines
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be used to explore some of the leading causes of death in the United States from 1980 to 2009, broken down by sex, race, and Hispanic origin. This data can be used to better understand mortality trends and risk factors associated with different populations in America.
By using this dataset you can compare and contrast mortality rates across different gender, racial, and ethnic groups during this time period. You can also compare different causes of death within these demographic categories to see if there are any patterns over time or notable differences between groups.
You could even use this data to track changes across population groups as a whole or look at details for specific years or types of causes of death in particular groups. With this information one may gain insight into health disparities across population segments in America— aiding advocates for social change & public policy shifts toward improved health outcomes for all Americans!
- Analyzing regional or state-level differences in mortality rates over time.
- Examining the beahvioral factors or risk factors associated with each cause of death for different genders and populations.
- Examining the prevalence of each cause of death as a proportion to an overall population trend in different socio-economic categories such as race or income level
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Selected_Trend_Table_from_Health_United_States_2011._Leading_causes_of_death_and_numbers_of_deaths_by_sex_race_and_Hispanic_origin_United_States_1980_and_2009.csv | Column name | Description | |:-------------------|:---------------------------------------------------------------------------------------------------------| | Group | The group of people the cause of death applies to (e.g. men, women, whites, blacks, hispanics). (String) | | Year | The year the cause of death was recorded. (Integer) | | Cause of death | The cause of death. (String) | | Flag | A flag indicating whether the cause of death is considered a leading cause. (Boolean) | | Deaths | The number of deaths attributed to the cause of death. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Health.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.
It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.
Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.
This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.
A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.
All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.