By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...
The United States Cancer Statistics (USCS) online databases in WONDER provide cancer incidence and mortality data for the United States for the years since 1999, by year, state and metropolitan areas (MSA), age group, race, ethnicity, sex, childhood cancer classifications and cancer site. Report case counts, deaths, crude and age-adjusted incidence and death rates, and 95% confidence intervals for rates. The USCS data are the official federal statistics on cancer incidence from registries having high-quality data and cancer mortality statistics for 50 states and the District of Columbia. USCS are produced by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), in collaboration with the North American Association of Central Cancer Registries (NAACCR). Mortality data are provided by the Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Vital Statistics System (NVSS).
By Data Exercises [source]
This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.
This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.
When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied
- Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
- This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
- This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...
Decrease the cancer death rate from 185.7 per 100,000 in 2013 to 180.3 per 100,000 by 2019.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows premature deaths (Age under 75) from all Cancers, numbers and rates by gender, as 3-year moving-averages.
Cancers are a major cause of premature deaths. Inequalities exist in cancer rates between the most deprived areas and the most affluent areas.
Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates.
A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death.
Data source: NHS Health and Social Care Information Centre (NHS-HSCIC) (Dataset unique identifier P00399). This data is updated annually.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lung Cancer Deaths reports the number, crude rate, and age-adjusted mortality rate (AAMR) of deaths due to lung cancer.
Provisional death counts of malignant neoplasms (cancer) by month and year, and other selected demographics, for 2020-2021. Data are based on death certificates for U.S. residents.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This data shows premature deaths (Age under 75) from all Cancers, numbers and rates by gender, as 3-year moving-averages. Cancers are a major cause of premature deaths. Inequalities exist in cancer rates between the most deprived areas and the most affluent areas. Directly Age-Standardised Rates (DASR) are shown in the data (where numbers are sufficient) so that death rates can be directly compared between areas. The DASR calculation applies Age-specific rates to a Standard (European) population to cancel out possible effects on crude rates due to different age structures among populations, thus enabling direct comparisons of rates. A limitation on using mortalities as a proxy for prevalence of health conditions is that mortalities may give an incomplete view of health conditions in an area, as ill-health might not lead to premature death. Data source: Office for Health Improvement and Disparities (OHID), indicator ID 40501, E05a. This data is updated annually.
Number and rate of new cancer cases diagnosed annually from 1992 to the most recent diagnosis year available. Included are all invasive cancers and in situ bladder cancer with cases defined using the Surveillance, Epidemiology and End Results (SEER) Groups for Primary Site based on the World Health Organization International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Random rounding of case counts to the nearest multiple of 5 is used to prevent inappropriate disclosure of health-related information.
Rate: Number of deaths due to all kinds of Cancer per 100,000 Population.
Definition: Number of deaths per 100,000 with malignant neoplasm (cancer) as the underlying cause (ICD-10 codes: C00-C97).
Data Sources:
(1) Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File. CDC WONDER On-line Database accessed at http://wonder.cdc.gov/cmf-icd10.html
(2) Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health
(3) Population Estimates, State Data Center, New Jersey Department of Labor and Workforce Development
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Deaths from prostate cancer - Directly age-Standardised Rates (DSR) per 100,000 population Source: Office for National Statistics (ONS) Publisher: Information Centre (IC) - Clinical and Health Outcomes Knowledge Base Geographies: Local Authority District (LAD), Government Office Region (GOR), National, Primary Care Trust (PCT), Strategic Health Authority (SHA) Geographic coverage: England Time coverage: 2005-07, 2007 Type of data: Administrative data
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised rate of mortality from oral cancer (ICD-10 codes C00-C14) in persons of all ages and sexes per 100,000 population.RationaleOver the last decade in the UK (between 2003-2005 and 2012-2014), oral cancer mortality rates have increased by 20% for males and 19% for females1Five year survival rates are 56%. Most oral cancers are triggered by tobacco and alcohol, which together account for 75% of cases2. Cigarette smoking is associated with an increased risk of the more common forms of oral cancer. The risk among cigarette smokers is estimated to be 10 times that for non-smokers. More intense use of tobacco increases the risk, while ceasing to smoke for 10 years or more reduces it to almost the same as that of non-smokers3. Oral cancer mortality rates can be used in conjunction with registration data to inform service planning as well as comparing survival rates across areas of England to assess the impact of public health prevention policies such as smoking cessation.References:(1) Cancer Research Campaign. Cancer Statistics: Oral – UK. London: CRC, 2000.(2) Blot WJ, McLaughlin JK, Winn DM et al. Smoking and drinking in relation to oral and pharyngeal cancer. Cancer Res 1988; 48: 3282-7. (3) La Vecchia C, Tavani A, Franceschi S et al. Epidemiology and prevention of oral cancer. Oral Oncology 1997; 33: 302-12.Definition of numeratorAll cancer mortality for lip, oral cavity and pharynx (ICD-10 C00-C14) in the respective calendar years aggregated into quinary age bands (0-4, 5-9,…, 85-89, 90+). This does not include secondary cancers or recurrences. Data are reported according to the calendar year in which the cancer was diagnosed.Counts of deaths for years up to and including 2019 have been adjusted where needed to take account of the MUSE ICD-10 coding change introduced in 2020. Detailed guidance on the MUSE implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causeofdeathcodinginmortalitystatisticssoftwarechanges/january2020Counts of deaths for years up to and including 2013 have been double adjusted by applying comparability ratios from both the IRIS coding change and the MUSE coding change where needed to take account of both the MUSE ICD-10 coding change and the IRIS ICD-10 coding change introduced in 2014. The detailed guidance on the IRIS implementation is available at: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/impactoftheimplementationofirissoftwareforicd10causeofdeathcodingonmortalitystatisticsenglandandwales/2014-08-08Counts of deaths for years up to and including 2010 have been triple adjusted by applying comparability ratios from the 2011 coding change, the IRIS coding change and the MUSE coding change where needed to take account of the MUSE ICD-10 coding change, the IRIS ICD-10 coding change and the ICD-10 coding change introduced in 2011. The detailed guidance on the 2011 implementation is available at https://webarchive.nationalarchives.gov.uk/ukgwa/20160108084125/http://www.ons.gov.uk/ons/guide-method/classifications/international-standard-classifications/icd-10-for-mortality/comparability-ratios/index.htmlDefinition of denominatorPopulation-years (aggregated populations for the three years) for people of all ages, aggregated into quinary age bands (0-4, 5-9, …, 85-89, 90+)
Rate: Number of deaths due to prostate cancer per 100,000 male population.
Definition: Number of deaths per 100,000 males with malignant neoplasm (cancer) of the prostate as the underlying cause of death (ICD-10 code: C61).
Data Sources:
(1) Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File. CDC WONDER On-line Database accessed at http://wonder.cdc.gov/cmf-icd10.html
(2) Death Certificate Database, Office of Vital Statistics and Registry, New Jersey Department of Health
(3) Population Estimates, State Data Center, New Jersey Department of Labor and Workforce Development
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two datasets that explore causes of death due to cancer in South Africa, drawing on data from the Revised Burden of Disease estimates for the Comparative Risk Factor Assessment for South Africa, 2000. The number and percentage of deaths due to cancer by cause are ranked for persons, males and females in the tables below. Lung cancer is the leading cause of cancer in SA accounting for 17% of all cancer deaths. This is followed by oesophagus Ca which accounts for 13%, cervix cancer accounting for 8%, breast cancer accounting for 8% and liver cancer which accounts for 6% of all cancers. Many more males suffer from lung and oesophagus cancer than females.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Annual percent change and average annual percent change in age-standardized cancer mortality rates since 1984 to the most recent data year. The table includes a selection of commonly diagnosed invasive cancers and causes of death are defined based on the World Health Organization International Classification of Diseases, ninth revision (ICD-9) from 1984 to 1999 and on its tenth revision (ICD-10) from 2000 to the most recent year.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.
Number of Cancer New Cases and Registered Deaths by Ten Leading Cancer Disease Group by Sex 2022
Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.Cancer incidence ratesIncidence rates were calculated using case counts from the California Cancer Registry. Population data from 2010 Census and SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators. Yearly SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators for 5-year incidence rates (2013-2017)According to California Department of Public Health guidelines, cancer incidence rates cannot be reported if based on <15 cancer cases and/or a population <10,000 to ensure confidentiality and stable statistical rates.Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity
Mortality Rates for Lake County, Illinois. Explanation of field attributes: Average Age of Death – The average age at which a people in the given zip code die. Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000. Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000. COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.
Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.
By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...