97 datasets found

County Cancer Death Rates
kaggle.com
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). County Cancer Death Rates [Dataset]. https://www.kaggle.com/datasets/thedevastator/county-cancer-death-rates
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
County Cancer Death Rates

County-level cancer death rates with related variables

By Noah Rippner [source]

About this dataset

This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.

Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.

The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.

To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.

Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.

It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.

Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.

Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes

How to use the dataset

Familiarize Yourself with the Columns:

County: The name of the county.

FIPS: The Federal Information Processing Standards code for the county.

Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).

Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.

Average Deaths per Year: The average number of deaths per year due to cancer in the county.

Recent Trend (2): The recent trend in cancer death rates/incidence in the county.

Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.

Average Annual Count: The average annual count of cancer deaths/incidence in the county.

Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.

Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.

Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.

Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.

Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...
A
NCHS - Drug Poisoning Mortality by State: United States
data.amerigeoss.org
healthdata.gov
+8more
csv, json, rdf, xml
Updated Mar 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). NCHS - Drug Poisoning Mortality by State: United States [Dataset]. https://data.amerigeoss.org/tl/dataset/nchs-drug-poisoning-mortality-by-state-united-states-7375f
Explore at:
rdf, xml, json, csvAvailable download formats
Dataset updated
Mar 30, 2022
Dataset provided by
United States
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
This dataset describes drug poisoning deaths at the U.S. and state level by selected demographic characteristics, and includes age-adjusted death rates for drug poisoning.

Deaths are classified using the International Classification of Diseases, Tenth Revision (ICD–10). Drug-poisoning deaths are defined as having ICD–10 underlying cause-of-death codes X40–X44 (unintentional), X60–X64 (suicide), X85 (homicide), or Y10–Y14 (undetermined intent).

Estimates are based on the National Vital Statistics System multiple cause-of-death mortality files (1). Age-adjusted death rates (deaths per 100,000 U.S. standard population for 2000) are calculated using the direct method. Populations used for computing death rates for 2011–2016 are postcensal estimates based on the 2010 U.S. census. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for noncensus years before 2010 are revised using updated intercensal population estimates and may differ from rates previously published.

Death rates for some states and years may be low due to a high number of unresolved pending cases or misclassification of ICD–10 codes for unintentional poisoning as R99, “Other ill-defined and unspecified causes of mortality” (2). For example, this issue is known to affect New Jersey in 2009 and West Virginia in 2005 and 2009 but also may affect other years and other states. Drug poisoning death rates may be underestimated in those instances.

REFERENCES 1. National Center for Health Statistics. National Vital Statistics System: Mortality data. Available from: http://www.cdc.gov/nchs/deaths.htm.

CDC. CDC Wonder: Underlying cause of death 1999–2016. Available from: http://wonder.cdc.gov/wonder/help/ucd.html.
n
National Longitudinal Mortality Study
neuinfo.org
rrid.site
+2more
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). National Longitudinal Mortality Study [Dataset]. http://identifiers.org/RRID:SCR_008946
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008946
Dataset updated
May 13, 2025
Description
A database based on a random sample of the noninstitutionalized population of the United States, developed for the purpose of studying the effects of demographic and socio-economic characteristics on differentials in mortality rates. It consists of data from 26 U.S. Current Population Surveys (CPS) cohorts, annual Social and Economic Supplements, and the 1980 Census cohort, combined with death certificate information to identify mortality status and cause of death covering the time interval, 1979 to 1998. The Current Population Surveys are March Supplements selected from the time period from March 1973 to March 1998. The NLMS routinely links geographical and demographic information from Census Bureau surveys and censuses to the NLMS database, and other available sources upon request. The Census Bureau and CMS have approved the linkage protocol and data acquisition is currently underway. The plan for the NLMS is to link information on mortality to the NLMS every two years from 1998 through 2006 with research on the resulting database to continue, at least, through 2009. The NLMS will continue to incorporate data from the yearly Annual Social and Economic Supplement into the study as the data become available. Based on the expected size of the Annual Social and Economic Supplements to be conducted, the expected number of deaths to be added to the NLMS through the updating process will increase the mortality content of the study to nearly 500,000 cases out of a total number of approximately 3.3 million records. This effort would also include expanding the NLMS population base by incorporating new March Supplement Current Population Survey data into the study as they become available. Linkages to the SEER and CMS datasets are also available. Data Availability: Due to the confidential nature of the data used in the NLMS, the public use dataset consists of a reduced number of CPS cohorts with a fixed follow-up period of five years. NIA does not make the data available directly. Research access to the entire NLMS database can be obtained through the NIA program contact listed. Interested investigators should email the NIA contact and send in a one page prospectus of the proposed project. NIA will approve projects based on their relevance to NIA/BSR''s areas of emphasis. Approved projects are then assigned to NLMS statisticians at the Census Bureau who work directly with the researcher to interface with the database. A modified version of the public use data files is available also through the Census restricted Data Centers. However, since the database is quite complex, many investigators have found that the most efficient way to access it is through the Census programmers. * Dates of Study: 1973-2009 * Study Features: Longitudinal * Sample Size: ~3.3 Million Link: *ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/00134
VSRR Provisional Maternal Death Counts and Rates
healthdata.gov
data.virginia.gov
+3more
application/rdfxml +5
Updated Mar 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cdc.gov (2023). VSRR Provisional Maternal Death Counts and Rates [Dataset]. https://healthdata.gov/w/ehys-jtzp/default?cur=gU0o09p1SY-
Explore at:
tsv, csv, application/rssxml, application/rdfxml, xml, jsonAvailable download formats
Dataset updated
Mar 17, 2023
Dataset provided by
data.cdc.gov
Description
This data presents national-level provisional maternal mortality rates based on a current flow of mortality and natality data in the National Vital Statistics System. Provisional rates which are an early estimate of the number of maternal deaths per 100,000 live births, are shown as of the date specified and may not include all deaths and births that occurred during a given time period (see Technical Notes).

A maternal death is the death of a woman while pregnant or within 42 days of termination of pregnancy irrespective of the duration and the site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management, but not from accidental or incidental causes. In this data visualization, maternal deaths are those deaths with an underlying cause of death assigned to International Statistical Classification of Diseases, 10th Revision (ICD-10) code numbers A34, O00–O95, and O98–O99.

The provisional data include reported 12 month-ending provisional maternal mortality rates overall, by age, and by race and Hispanic origin. Provisional maternal mortality rates presented in this data visualization are for “12-month ending periods,” defined as the number of maternal deaths per 100,000 live births occurring in the 12-month period ending in the month indicated. For example, the 12-month ending period in June 2020 would include deaths and births occurring from July 1, 2019, through June 30, 2020. Evaluation of trends over time should compare estimates from year to year (June 2020 and June 2021), rather than month to month, to avoid overlapping time periods. In the visualization and in the accompanying data file, rates based on death counts less than 20 are suppressed in accordance with current NCHS standards of reliability for rates. Death counts between 1-9 in the data file are suppressed in accordance with National Center for Health Statistics (NCHS) confidentiality standards.

Provisional data presented on this page will be updated on a quarterly basis as additional records are received. Previously released estimates are revised to include data and record updates received since the previous release. As a result, the reliability of estimates for a 12-month period ending with a specific month will improve with each quarterly release and estimates for previous time periods may change as new data and updates are received.
Deaths and age-specific mortality rates, by selected grouped causes
www150.statcan.gc.ca
open.canada.ca
+2more
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Deaths and age-specific mortality rates, by selected grouped causes [Dataset]. http://doi.org/10.25318/1310039201-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310039201-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number of deaths and age-specific mortality rates for selected grouped causes, by age group and sex, 2000 to most recent year.
e
Historic Mortality and Population Data, 1901-1992 - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Historic Mortality and Population Data, 1901-1992 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/b589fb1f-2aa5-59e5-b889-d153426dfd27
Explore at:
Dataset updated
Oct 31, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. In the analysis of any particular set of mortality data, a pivotal role is frequently played by national death rates by age, sex and cause. For example, the analysis of cause specific time trends and their correlates generally draws upon data of this sort. At a broader level, international comparisons utilise the rates of several nations in order to make meaningful inferences about possible causal associations. By contrast, local mortality studies, including sub-sets and sub-divisions of the national population, call upon national rates to provide a reference set of background mortality levels against which local experience can be measured. However, the extent to which this can be done is dependent upon the availability of national rates on computer. In recognition of this, OPCS has constructed a database comprising the basic building bricks for constructing any aggregate database. In this instance the basic components of the database comprise number of deaths, held to the lowest level to which cause was routinely coded. The calculation of rates is made possible with this set of data by the provision of a comparable tape of estimates of population at risk. The data comprise two files, the deaths file and the population file. Each count held on the deaths file is stored in a separate record, referenced by cause, sex, age and year to which it refers. The population data are held in an identical format to that used for the death file with the exception of the cause variable, which is set to zero.
d
COVID-19-Associated Deaths by Date of Death - ARCHIVE
datasets.ai
data.ct.gov
+1more
23, 40, 55, 8
Updated Aug 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Connecticut (2024). COVID-19-Associated Deaths by Date of Death - ARCHIVE [Dataset]. https://datasets.ai/datasets/covid-19-associated-deaths-by-date-of-death
Explore at:
8, 55, 40, 23Available download formats
Dataset updated
Aug 27, 2024
Dataset authored and provided by
State of Connecticut
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

Count of COVID-19-associated deaths by date of death. Deaths reported to either the OCME or DPH are included in the COVID-19 data. COVID-19-associated deaths include persons who tested positive for COVID-19 around the time of death and persons who were not tested for COVID-19 whose death certificate lists COVID-19 disease as a cause of death or a significant condition contributing to death.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Note the counts in this dataset may vary from the death counts in the other COVID-19-related datasets published on data.ct.gov, where deaths are counted on the date reported rather than the date of death
G
Local Geographic Area (LGA) Age-Standardized Mortality Rates (per 100,000...
ouvert.canada.ca
open.alberta.ca
+1more
html, xlsx
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Alberta (2024). Local Geographic Area (LGA) Age-Standardized Mortality Rates (per 100,000 population) by Three Year Period, 2012/2014 - 2019/2021 [Dataset]. https://ouvert.canada.ca/data/dataset/df5eecbc-8981-4d66-a851-f6d60b01e36a
Explore at:
xlsx, htmlAvailable download formats
Dataset updated
Jul 24, 2024
Dataset provided by
Government of Alberta
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 2012 - Dec 31, 2021
Description
Figure 7.1 provides the age-standardized mortality rates per 100,000 population, for the three selected causes of death and all causes combined. The three selected causes of death are Circulatory System, Neoplasms and External Causes (Injury). Age standardization is a technique applied to make rates comparable across groups with different age distributions. A simple rate is defined as the number of people with a particular condition divided by the whole population. An age-standardized rate is defined as the number of people with a condition divided by the population within each age group. Standardizing (adjusting) the rate across age groups allows a more accurate comparison between populations that have different age structures. Age standardization is typically done when comparing rates across time periods, different geographic areas, and or population sub-groups (e.g. ethnic group). This indicator dataset contains information at both Local Geographic Area (for example, Lacombe, Red Deer - North, Calgary - West Bow, etc.) and Alberta levels. Local geographic area refers to 132 geographic areas created by Alberta Health (AH) and Alberta Health Services (AHS) based on census boundaries. This table is the part of "Alberta Health Primary Health Care - Community Profiles" report published August 2022
S
Monthly COVID-19 Death Rates per 100,000 Population by Age Group, Race and...
splitgraph.com
data.virginia.gov
+5more
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cdc-gov (2024). Monthly COVID-19 Death Rates per 100,000 Population by Age Group, Race and Ethnicity, Sex, and Region [Dataset]. https://www.splitgraph.com/cdc-gov/monthly-covid19-death-rates-per-100000-population-89qs-mr7i/
Explore at:
json, application/openapi+json, application/vnd.splitgraph.imageAvailable download formats
Dataset updated
Jun 28, 2024
Authors
cdc-gov
Description
Monthly COVID-19 death rates per 100,000 population stratified by age group, race/ethnicity, sex, and region

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
Distribution of death rate by comorbidities.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Dec 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luwei Ye; Mei Feng; Qingran Lin; Fang Li; Jun Lyu (2023). Distribution of death rate by comorbidities. [Dataset]. http://doi.org/10.1371/journal.pone.0287254.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287254.t004
Dataset updated
Dec 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Luwei Ye; Mei Feng; Qingran Lin; Fang Li; Jun Lyu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe Surviving Sepsis Campaign (SSC) believed that early identification of septic shock, aggressive fluid resuscitation and maintenance of effective perfusion pressure should be carried out. However, some of the current research focused on a single death factor for sepsis patients, based on a limited sample, and the research results of the relationship between comorbidities and sepsis related death also have some controversies.MethodTherefore, our study used data from a large sample of 9,544 sepsis patients aged 18–85 obtained from the MIMIC-IV database, to explore the risk factors of death in patients with sepsis. We used the general clinical information, organ dysfunction scores, and comorbidities to analyze the independent risk factors for death of these patients.ResultsThe death group had significantly higher organ dysfunction scores, lower BMI, lower body temperature, faster heart rate and lower urine-output. Among the comorbidities, patients suffering from congestive heart failure and liver disease had a higher mortality rate.ConclusionThis study helps to identify sepsis early, based on a comprehensive evaluation of a patient’s basic information, organ dysfunction scores and comorbidities, and this methodology could be used for actual clinical diagnosis in hospitals.
S
Age-Adjusted Death Rates by Selected Causes of Death among Maryland...
splitgraph.com
healthdata.gov
+2more
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vital Statistics Administration (2024). Age-Adjusted Death Rates by Selected Causes of Death among Maryland Residents [Dataset]. https://www.splitgraph.com/opendata-maryland-gov/ageadjusted-death-rates-by-selected-causes-of-i4x2-3kc7/
Explore at:
json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
Dataset updated
Aug 14, 2024
Dataset authored and provided by
Vital Statistics Administration
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Maryland
Description
This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated 8/14/2024.

Rate of deaths per 100,000 population by selected underlying causes of death among Maryland residents (1992-2017).

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
f
Model goodness of fit by level of observed death registration completeness...
plos.figshare.com
xls
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Adair; Alan D. Lopez (2023). Model goodness of fit by level of observed death registration completeness (%), full sample and country-year and country level out-of-sample validation, Models 1 and 2, both sexes. [Dataset]. http://doi.org/10.1371/journal.pone.0197047.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0197047.t004
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Tim Adair; Alan D. Lopez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model goodness of fit by level of observed death registration completeness (%), full sample and country-year and country level out-of-sample validation, Models 1 and 2, both sexes.
Homicide death rate among 20-34 year old persons (per 100,000), New Jersey,...
splitgraph.com
healthdata.nj.gov
Updated Sep 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New Jersey Department of Health (2020). Homicide death rate among 20-34 year old persons (per 100,000), New Jersey, by data year: Beginning 2009-2011, [Dataset]. https://www.splitgraph.com/healthdata-nj-gov/homicide-death-rate-among-2034-year-old-persons-8im6-5hsc
Explore at:
application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
Dataset updated
Sep 9, 2020
Dataset authored and provided by
New Jersey Department of Healthhttps://www.nj.gov/health/
Area covered
New Jersey
Description
Rate: Homicide deaths per 100,000 persons aged 20-24

Definition: Deaths where homicide is indicated as the underlying cause of death. Homicide is defined as death resulting from the intentional use of force or power, threatened or actual, against another person, group, or community. ICD-10 Codes: X85-Y09, Y87.1 (homicide)

Data Source:

1) Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health

2) Population Estimates, State Data Center, New Jersey Department of Labor and Workforce Development

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
r
Early Indicators of Later Work Levels Disease and Death (EI) - Union Army...
rrid.site
scicrunch.org
+2more
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Early Indicators of Later Work Levels Disease and Death (EI) - Union Army Samples Public Health and Ecological Datasets [Dataset]. http://identifiers.org/RRID:SCR_008921
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008921
Dataset updated
Jul 27, 2025
Description
A dataset to advance the study of life-cycle interactions of biomedical and socioeconomic factors in the aging process. The EI project has assembled a variety of large datasets covering the life histories of approximately 39,616 white male volunteers (drawn from a random sample of 331 companies) who served in the Union Army (UA), and of about 6,000 African-American veterans from 51 randomly selected United States Colored Troops companies (USCT). Their military records were linked to pension and medical records that detailed the soldiers��?? health status and socioeconomic and family characteristics. Each soldier was searched for in the US decennial census for the years in which they were most likely to be found alive (1850, 1860, 1880, 1900, 1910). In addition, a sample consisting of 70,000 men examined for service in the Union Army between September 1864 and April 1865 has been assembled and linked only to census records. These records will be useful for life-cycle comparisons of those accepted and rejected for service. Military Data: The military service and wartime medical histories of the UA and USCT men were collected from the Union Army and United States Colored Troops military service records, carded medical records, and other wartime documents. Pension Data: Wherever possible, the UA and USCT samples have been linked to pension records, including surgeon''''s certificates. About 70% of men in the Union Army sample have a pension. These records provide the bulk of the socioeconomic and demographic information on these men from the late 1800s through the early 1900s, including family structure and employment information. In addition, the surgeon''''s certificates provide rich medical histories, with an average of 5 examinations per linked recruit for the UA, and about 2.5 exams per USCT recruit. Census Data: Both early and late-age familial and socioeconomic information is collected from the manuscript schedules of the federal censuses of 1850, 1860, 1870 (incomplete), 1880, 1900, and 1910. Data Availability: All of the datasets (Military Union Army; linked Census; Surgeon''''s Certificates; Examination Records, and supporting ecological and environmental variables) are publicly available from ICPSR. In addition, copies on CD-ROM may be obtained from the CPE, which also maintains an interactive Internet Data Archive and Documentation Library, which can be accessed on the Project Website. * Dates of Study: 1850-1910 * Study Features: Longitudinal, Minority Oversamples * Sample Size: ** Union Army: 35,747 ** Colored Troops: 6,187 ** Examination Sample: 70,800 ICPSR Link: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06836
d
Human Mortality Database
dknet.org
neuinfo.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Human Mortality Database [Dataset]. http://identifiers.org/RRID:SCR_002370
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002370
Dataset updated
Jan 29, 2022
Description
A database providing detailed mortality and population data to those interested in the history of human longevity. For each country, the database includes calculated death rates and life tables by age, time, and sex, along with all of the raw data (vital statistics, census counts, population estimates) used in computing these quantities. Data are presented in a variety of formats with regard to age groups and time periods. The main goal of the database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. New data series is continually added to this collection. However, the database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included are relatively wealthy and for the most part highly industrialized. The database replaces an earlier NIA-funded project, known as the Berkeley Mortality Database. * Dates of Study: 1751-present * Study Features: Longitudinal, International * Sample Size: 37 countries or areas
d
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
catalog.data.gov
data.ct.gov
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Predict Mortality/Death Rate.
kaggle.com
zip
Updated Aug 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajanand Ilangovan (2017). Predict Mortality/Death Rate. [Dataset]. https://www.kaggle.com/rajanand/mortality
Explore at:
zip(59991550 bytes)Available download formats
Dataset updated
Aug 8, 2017
Authors
Rajanand Ilangovan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
"https://link.rajanand.org/sql-challenges" target="_blank"> https://link.rajanand.org/banner-01" alt="SQL Data Challenges" style="width: 700px; height: 120px">
--- Context: ----------- **Annual Health Survey : Mortality Schedule ** This unit level dataset contains the details relating to death occurred to usual residents of sample household during the reference period and it includes information on sex of deceased, date of death, age at death, registration of death and source of medical attention received before death. For infant deaths, data related to symptoms preceding death is also provided. Mortality Schedule also includes information on various determinants of maternal mortality viz. case of deaths associated with pregnancy, information on factors leading/ contributing to death, symptoms preceding death, time between onset of complications and death, etc. There are total of 770k observations and 121 variables in this dataset. **[Survey:](http://www.who.int/bulletin/volumes/94/4/BLT-15-158493-table-T1.html)** Base line survey - 2010-11 (4.14 million households in the sample) 1st update - 2011-12 (4.28 million households in the sample) 2nd update - 2012-13 (4.32 million households in the sample) The survey was conducted in the below 9 states. A. Empowered Action Group [(EAG)](http://pib.nic.in/newsite/mbErel.aspx?relid=85350) States 1. Uttarakhand (05) 2. Rajasthan (08) 3. Uttar Pradesh (09) 4. Bihar (10) 5. Jharkhand (20) 6. Odisha (21) 7. Chhattisgarh (22) 8. Madhya Pradesh (23) B. Assam. (18) These nine states, which account for about 48 percent of the total population, 59 percent of Births, 70 percent of Infant Deaths, 75 percent of Under 5 Deaths and 62 percent of Maternal Deaths in the country, are the high focus States in view of their relatively higher fertility and mortality. Content: ----------- The files contains the below columns. **Variable Names:** 1. id 2. m_id 3. client_m_id 4. hl_id 5. house_no 6. house_hold_no 7. state 8. district 9. rural 10. stratum_code 11. psu_id 12. m_serial_no 13. deceased_sex 14. date_of_death 15. month_of_death 16. year_of_death 17. age_of_death_below_one_month 18. age_of_death_below_eleven_month 19. age_of_death_above_one_year 20. treatment_source 21. place_of_death 22. is_death_reg 23. is_death_certificate_received 24. serial_num_of_infant_mother 25. order_of_birth 26. death_symptoms 27. is_death_associated_with_pregnan 28. death_period 29. months_of_pregnancy 30. factors_contributing_death 31. factors_contributing_death_2 32. symptoms_of_death 33. time_between_onset_of_complicati 34. nearest_medical_facility 35. m_expall_status 36. field38 37. hh_id 38. client_hh_id 39. currently_dead_or_out_migrated 40. hh_serial_no 41. sex 42. usual_residance 43. relation_to_head 44. member_identity 45. father_serial_no 46. mother_serial_no 47. date_of_birth 48. month_of_birth 49. year_of_birth 50. age 51. religion 52. social_group_code 53. marital_status 54. date_of_marriage 55. month_of_marriage 56. year_of_marriage 57. currently_attending_school 58. reason_for_not_attending_school 59. highest_qualification 60. occupation_status 61. disability_status 62. injury_treatment_type 63. illness_type 64. symptoms_pertaining_illness 65. sought_medical_care 66. diagnosed_for 67. diagnosis_source 68. regular_treatment 69. regular_treatment_source 70. chew 71. smoke 72. alcohol 73. status 74. hh_expall_status 75. client_hl_id 76. serial_no 77. building_no 78. house_status 79. house_structure 80. owner_status 81. drinking_water_source 82. is_water_filter 83. water_filteration 84. toilet_used 85. is_toilet_shared 86. household_have_electricity 87. lighting_source 88. cooking_fuel 89. no_of_dwelling_rooms 90. kitchen_availability 91. is_radio 92. is_television 93. is_computer 94. is_telephone 95. is_washing_machine 96. is_refrigerator 97. is_sewing_machine 98. is_bicycle 99. is_scooter 100. is_car 101. is_tractor 102. is_water_pump 103. cart 104. land_possessed 105. hl_expall_status 106. fid 107. isdeadmigrated 108. residancial_status 109. iscoveredbyhealthscheme 110. healthscheme_1 111. healthscheme_2 112. housestatus 113. householdstatus 114. isheadchanged 115. fidh 116. fidx 117. as 118. wt 119. x 120. schedule_id 121. year **File content:** Mortality_data_dictionary.xlsx : This [**data dictionary**](https://www.kaggle.com/rajanand/mortality/downloads/Mortality_data_dictionary.xlsx) excel work book has the detailed information about each and every column and codes used in the data. Acknowledgements ---------------- [Department of Health and Family Welfare](https://nrhm-mis.nic.in/hmisreports/AHSReports.aspx), Govt. of India has published this [dataset](https://data.gov.in/catalog/annual-health-survey-mortality-schedule) in Open Govt Data Platform India portal under [Govt. Open Data License - India](https://data.gov.in/government-open-data-license-india). ---
"https://link.rajanand.org/sql-challenges" target="_blank"> https://link.rajanand.org/banner-02" alt="SQL Data Challenges" style="width: 700px; height: 120px">
Death in the United States
kaggle.com
zip
Updated Aug 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2017). Death in the United States [Dataset]. https://www.kaggle.com/cdc/mortality
Explore at:
zip(766333584 bytes)Available download formats
Dataset updated
Aug 3, 2017
Dataset authored and provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.

It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.

Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.

Overview

This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.

A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.

All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.

Project ideas

The CDC's mortality data was the basis of a widely publicized paper, by Anne Case and Nobel prize winner Angus Deaton, arguing that middle-aged whites are dying at elevated rates. One of the criticisms against the paper is that it failed to properly account for the exact ages within the broad bins available through the CDC's WONDER tool. What do these results look like with exact/not-binned age data?

Similarly, how sensitive are the mortality trends being discussed in the news to the choice of bin-widths?

As noted above, the data preparation process could have introduced errors. Can you find any discrepancies compared to the aggregate metrics on WONDER? If so, please let me know in the forums!

WONDER is cited in numerous economics, sociology, and public health research papers. Can you find any papers whose conclusions would be altered if they used the exact data available here rather than binned data from Wonder?

Differences from the first version of the dataset

This version of the dataset was prepared in a completely different many. This has allowed us to provide a much larger volume of data and ensure that codes are available for every field.

We've replaced the batch of sql files with a single JSON per year. Kaggle's platform currently offer's better support for JSON files, and this keeps the number of files manageable.

A tutorial kernel providing a quick introduction to the new format is available here.

Lastly, I apologize if the transition has interrupted anyone's work! If need be, you can still download v1.
census-bureau-international
kaggle.com
zip
Updated May 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/bigquery/census-bureau-international
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

Sample Query 1

What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

standardSQL

SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

Sample Query 2

Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

standardSQL

SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

Sample Query 3

The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

Update frequency

Historic (none)

Dataset source

United States Census Bureau

Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Vital Signs: Life Expectancy – Bay Area
data.bayareametro.gov
csv, xlsx, xml
Updated Apr 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of California, Department of Health: Death Records (2017). Vital Signs: Life Expectancy – Bay Area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Life-Expectancy-Bay-Area/emjt-svg9
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Apr 7, 2017
Dataset provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Authors
State of California, Department of Health: Death Records
Area covered
San Francisco Bay Area
Description
VITAL SIGNS INDICATOR Life Expectancy (EQ6)

FULL MEASURE NAME Life Expectancy

LAST UPDATED April 2017

DESCRIPTION Life expectancy refers to the average number of years a newborn is expected to live if mortality patterns remain the same. The measure reflects the mortality rate across a population for a point in time.

DATA SOURCE State of California, Department of Health: Death Records (1990-2013) No link

California Department of Finance: Population Estimates Annual Intercensal Population Estimates (1990-2010) Table P-2: County Population by Age (2010-2013) http://www.dof.ca.gov/Forecasting/Demographics/Estimates/

CONTACT INFORMATION vitalsigns.info@mtc.ca.gov

METHODOLOGY NOTES (across all datasets for this indicator) Life expectancy is commonly used as a measure of the health of a population. Life expectancy does not reflect how long any given individual is expected to live; rather, it is an artificial measure that captures an aspect of the mortality rates across a population. Vital Signs measures life expectancy at birth (as opposed to cohort life expectancy). A statistical model was used to estimate life expectancy for Bay Area counties and Zip codes based on current life tables which require both age and mortality data. A life table is a table which shows, for each age, the survivorship of a people from a certain population.

Current life tables were created using death records and population estimates by age. The California Department of Public Health provided death records based on the California death certificate information. Records include age at death and residential Zip code. Single-year age population estimates at the regional- and county-level comes from the California Department of Finance population estimates and projections for ages 0-100+. Population estimates for ages 100 and over are aggregated to a single age interval. Using this data, death rates in a population within age groups for a given year are computed to form unabridged life tables (as opposed to abridged life tables). To calculate life expectancy, the probability of dying between the jth and (j+1)st birthday is assumed uniform after age 1. Special consideration is taken to account for infant mortality. For the Zip code-level life expectancy calculation, it is assumed that postal Zip codes share the same boundaries as Zip Code Census Tabulation Areas (ZCTAs). More information on the relationship between Zip codes and ZCTAs can be found at https://www.census.gov/geo/reference/zctas.html. Zip code-level data uses three years of mortality data to make robust estimates due to small sample size. Year 2013 Zip code life expectancy estimates reflects death records from 2011 through 2013. 2013 is the last year with available mortality data. Death records for Zip codes with zero population (like those associated with P.O. Boxes) were assigned to the nearest Zip code with population. Zip code population for 2000 estimates comes from the Decennial Census. Zip code population for 2013 estimates are from the American Community Survey (5-Year Average). The ACS provides Zip code population by age in five-year age intervals. Single-year age population estimates were calculated by distributing population within an age interval to single-year ages using the county distribution. Counties were assigned to Zip codes based on majority land-area.

Zip codes in the Bay Area vary in population from over 10,000 residents to less than 20 residents. Traditional life expectancy estimation (like the one used for the regional- and county-level Vital Signs estimates) cannot be used because they are highly inaccurate for small populations and may result in over/underestimation of life expectancy. To avoid inaccurate estimates, Zip codes with populations of less than 5,000 were aggregated with neighboring Zip codes until the merged areas had a population of more than 5,000. In this way, the original 305 Bay Area Zip codes were reduced to 218 Zip code areas for 2013 estimates. Next, a form of Bayesian random-effects analysis was used which established a prior distribution of the probability of death at each age using the regional distribution. This prior is used to shore up the life expectancy calculations where data were sparse.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). County Cancer Death Rates [Dataset]. https://www.kaggle.com/datasets/thedevastator/county-cancer-death-rates

County Cancer Death Rates

County-level cancer death rates with related variables

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 3, 2023

Dataset provided by

Kaggle

Authors

The Devastator

Description

County Cancer Death Rates

County-level cancer death rates with related variables

By Noah Rippner [source]

About this dataset

This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.

Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.

The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.

To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.

Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.

It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.

Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.

Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes

How to use the dataset

Familiarize Yourself with the Columns:

County: The name of the county.

FIPS: The Federal Information Processing Standards code for the county.

Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).

Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.

Average Deaths per Year: The average number of deaths per year due to cancer in the county.

Recent Trend (2): The recent trend in cancer death rates/incidence in the county.

Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.

Average Annual Count: The average annual count of cancer deaths/incidence in the county.

Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.

Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.

Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.

Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.

Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...

Clear search

Close search

Google apps

Main menu

County Cancer Death Rates

County Cancer Death Rates

County-level cancer death rates with related variables

About this dataset

How to use the dataset

NCHS - Drug Poisoning Mortality by State: United States

National Longitudinal Mortality Study

VSRR Provisional Maternal Death Counts and Rates

Deaths and age-specific mortality rates, by selected grouped causes

Historic Mortality and Population Data, 1901-1992 - Dataset - B2FIND

COVID-19-Associated Deaths by Date of Death - ARCHIVE

Local Geographic Area (LGA) Age-Standardized Mortality Rates (per 100,000...

Monthly COVID-19 Death Rates per 100,000 Population by Age Group, Race and...

Distribution of death rate by comorbidities.

Age-Adjusted Death Rates by Selected Causes of Death among Maryland...

Model goodness of fit by level of observed death registration completeness...

Homicide death rate among 20-34 year old persons (per 100,000), New Jersey,...

Early Indicators of Later Work Levels Disease and Death (EI) - Union Army...

Human Mortality Database

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

Predict Mortality/Death Rate.

Death in the United States

Overview

Project ideas

Differences from the first version of the dataset

census-bureau-international

Context

Querying BigQuery tables

Sample Query 1

standardSQL

Sample Query 2

standardSQL

Sample Query 3

Update frequency

Dataset source

Vital Signs: Life Expectancy – Bay Area

County Cancer Death Rates

County-level cancer death rates with related variables

County Cancer Death Rates

County-level cancer death rates with related variables

About this dataset

How to use the dataset