https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information.
This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences.
A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it.
Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set.
There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". - How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. - How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? - What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? - Unraveling the correlations between the features would take significant work.
Note: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org.
This dataset gives the average life expectancy and corresponding confidence intervals for each Chicago community area for the years 1990, 2000 and 2010. See the full description at: https://data.cityofchicago.org/api/views/qjr3-bm53/files/AAu4x8SCRz_bnQb8SVUyAXdd913TMObSYj6V40cR6p8?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\Life Expectancy\Dataset description - LE by community area.pdf
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 2394 series, with data for years 1991 - 1991 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 items: Canada ...), Population group (19 items: Entire cohort; Income adequacy quintile 1 (lowest);Income adequacy quintile 2;Income adequacy quintile 3 ...), Age (14 items: At 25 years; At 30 years; At 40 years; At 35 years ...), Sex (3 items: Both sexes; Females; Males ...), Characteristics (3 items: Life expectancy; High 95% confidence interval; life expectancy; Low 95% confidence interval; life expectancy ...).
This dataset of U.S. mortality trends since 1900 highlights the differences in age-adjusted death rates and life expectancy at birth by race and sex. Age-adjusted death rates (deaths per 100,000) after 1998 are calculated based on the 2000 U.S. standard population. Populations used for computing death rates for 2011–2017 are postcensal estimates based on the 2010 census, estimated as of July 1, 2010. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for noncensus years between 2000 and 2010 are revised using updated intercensal population estimates and may differ from rates previously published. Data on age-adjusted death rates prior to 1999 are taken from historical data (see References below). Life expectancy data are available up to 2017. Due to changes in categories of race used in publications, data are not available for the black population consistently before 1968, and not at all before 1960. More information on historical data on age-adjusted death rates is available at https://www.cdc.gov/nchs/nvss/mortality/hist293.htm. SOURCES CDC/NCHS, National Vital Statistics System, historical data, 1900-1998 (see https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm); CDC/NCHS, National Vital Statistics System, mortality data (see http://www.cdc.gov/nchs/deaths.htm); and CDC WONDER (see http://wonder.cdc.gov). REFERENCES National Center for Health Statistics, Data Warehouse. Comparability of cause-of-death between ICD revisions. 2008. Available from: http://www.cdc.gov/nchs/nvss/mortality/comparability_icd.htm. National Center for Health Statistics. Vital statistics data available. Mortality multiple cause files. Hyattsville, MD: National Center for Health Statistics. Available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm. Kochanek KD, Murphy SL, Xu JQ, Arias E. Deaths: Final data for 2017. National Vital Statistics Reports; vol 68 no 9. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_09-508.pdf. Arias E, Xu JQ. United States life tables, 2017. National Vital Statistics Reports; vol 68 no 7. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_07-508.pdf. National Center for Health Statistics. Historical Data, 1900-1998. 2009. Available from: https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm.
VITAL SIGNS INDICATOR Life Expectancy (EQ6)
FULL MEASURE NAME Life Expectancy
LAST UPDATED April 2017
DESCRIPTION Life expectancy refers to the average number of years a newborn is expected to live if mortality patterns remain the same. The measure reflects the mortality rate across a population for a point in time.
DATA SOURCE State of California, Department of Health: Death Records (1990-2013) No link
California Department of Finance: Population Estimates Annual Intercensal Population Estimates (1990-2010) Table P-2: County Population by Age (2010-2013) http://www.dof.ca.gov/Forecasting/Demographics/Estimates/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) Life expectancy is commonly used as a measure of the health of a population. Life expectancy does not reflect how long any given individual is expected to live; rather, it is an artificial measure that captures an aspect of the mortality rates across a population. Vital Signs measures life expectancy at birth (as opposed to cohort life expectancy). A statistical model was used to estimate life expectancy for Bay Area counties and Zip codes based on current life tables which require both age and mortality data. A life table is a table which shows, for each age, the survivorship of a people from a certain population.
Current life tables were created using death records and population estimates by age. The California Department of Public Health provided death records based on the California death certificate information. Records include age at death and residential Zip code. Single-year age population estimates at the regional- and county-level comes from the California Department of Finance population estimates and projections for ages 0-100+. Population estimates for ages 100 and over are aggregated to a single age interval. Using this data, death rates in a population within age groups for a given year are computed to form unabridged life tables (as opposed to abridged life tables). To calculate life expectancy, the probability of dying between the jth and (j+1)st birthday is assumed uniform after age 1. Special consideration is taken to account for infant mortality. For the Zip code-level life expectancy calculation, it is assumed that postal Zip codes share the same boundaries as Zip Code Census Tabulation Areas (ZCTAs). More information on the relationship between Zip codes and ZCTAs can be found at https://www.census.gov/geo/reference/zctas.html. Zip code-level data uses three years of mortality data to make robust estimates due to small sample size. Year 2013 Zip code life expectancy estimates reflects death records from 2011 through 2013. 2013 is the last year with available mortality data. Death records for Zip codes with zero population (like those associated with P.O. Boxes) were assigned to the nearest Zip code with population. Zip code population for 2000 estimates comes from the Decennial Census. Zip code population for 2013 estimates are from the American Community Survey (5-Year Average). The ACS provides Zip code population by age in five-year age intervals. Single-year age population estimates were calculated by distributing population within an age interval to single-year ages using the county distribution. Counties were assigned to Zip codes based on majority land-area.
Zip codes in the Bay Area vary in population from over 10,000 residents to less than 20 residents. Traditional life expectancy estimation (like the one used for the regional- and county-level Vital Signs estimates) cannot be used because they are highly inaccurate for small populations and may result in over/underestimation of life expectancy. To avoid inaccurate estimates, Zip codes with populations of less than 5,000 were aggregated with neighboring Zip codes until the merged areas had a population of more than 5,000. In this way, the original 305 Bay Area Zip codes were reduced to 218 Zip code areas for 2013 estimates. Next, a form of Bayesian random-effects analysis was used which established a prior distribution of the probability of death at each age using the regional distribution. This prior is used to shore up the life expectancy calculations where data were sparse.
Life expectancy at birth and at age 65, by sex, on a three-year average basis.
Life expectancy at birth is defined as how long, on average, a newborn can expect to live, if current death rates do not change. This dataset can help you gain insights regarding the life expectancy and mortality rate.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Health expectancies for both sexes at birth by upper tier local authority with confidence intervals and proportions of life with and without disability.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Pivot table for healthy life expectancy by sex and area type, divided by three-year intervals starting from 2011 to 2013.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Life expectancy in years, at birth and for age groups. Breakdowns are also given for deprivation (SIMD) and Urban Rural classification. Life expectancy refers to the number of years that a person could expect to survive if the current mortality rates for each age group, sex and geographic area remain constant throughout their life. This is referred to as ‘period life expectancy’ and does not usually reflect the actual number of years that a person will survive. This is because it does not take into account changes in health care and other social factors that may occur through someone’s lifetime. However, life expectancy is a useful statistic as it provides a snapshot of the health of a population and allows the identification of inequalities between populations. Further details are available on the NRS website
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This dataset comprises 204 entries and 38 attributes, providing a comprehensive analysis of key economic and social indicators across various countries. It includes a diverse range of metrics, allowing for in-depth exploration of global trends related to GDP, education, health, and environmental factors.
Key Features:
Applications and Uses:
Research and Analysis: Ideal for researchers studying the correlation between economic performance and social indicators. This dataset can help identify trends and patterns relevant to global development.
Policy Development: Policymakers can utilize this data to inform decisions on education, healthcare, and environmental policies, aiming to improve national outcomes.
Machine Learning and Data Science: Data scientists can apply machine learning techniques to predict economic trends, analyze social impacts, or classify countries based on various indicators.
Educational Purposes: Suitable for students and educators in fields like economics, sociology, and environmental science for practical data analysis exercises.
Visualization Projects: Perfect for creating compelling visualizations that illustrate relationships between different metrics, aiding in public understanding and engagement.
By leveraging this dataset, users can uncover insights into how different factors influence a country's development, making it a valuable resource for diverse applications across various fields.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information.
This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences.
A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it.
Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set.
There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". - How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. - How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? - What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? - Unraveling the correlations between the features would take significant work.