Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can access data about cancer statistics in the United States including but not limited to searches by type of cancer and race, sex, ethnicity, age at diagnosis, and age at death. Background Surveillance Epidemiology and End Results (SEER) database’s mission is to provide information on cancer statistics to help reduce the burden of disease in the U.S. population. The SEER database is a project to the National Cancer Institute. The SEER database collects information on incidence, prevalence, and survival from specific geographic areas representing 28 percent of the United States population. User functionality Users can access a variety of reso urces. Cancer Stat Fact Sheets allow users to look at summaries of statistics by major cancer type. Cancer Statistic Reviews are available from 1975-2008 in table format. Users are also able to build their own tables and graphs using Fast Stats. The Cancer Query system provides more flexibility and a larger set of cancer statistics than F ast Stats but requires more input from the user. State Cancer Profiles include dynamic maps and graphs enabling the investigation of cancer trends at the county, state, and national levels. SEER research data files and SEER*Stat software are available to download through your Internet connection (SEER*Stat’s client-server mode) or via discs shipped directly to you. A signed data agreement form is required to access the SEER data Data Notes Data is available in different formats depending on which type of data is accessed. Some data is available in table, PDF, and html formats. Detailed information about the data is available under “Data Documentation and Variable Recodes”.
SEER Limited-Use cancer incidence data with associated population data. Geographic areas available are county and SEER registry. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute collects and distributes high quality, comprehensive cancer data from a number of population-based cancer registries. Data include patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and survival rates within each stage.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains Cancer Incidence data for Breast Cancer (Late Stage^) including: Age-Adjusted Rate, Confidence Interval, Average Annual Count, and Trend field information for US States for the average 5 year span from 2016 to 2020.Data are for females segmented by age (All Ages, Ages Under 50, Ages 50 & Over, Ages Under 65, and Ages 65 & Over), with field names and aliases describing the sex and age group tabulated.For more information, visit statecancerprofiles.cancer.govData NotationsState Cancer Registries may provide more current or more local data.TrendRising when 95% confidence interval of average annual percent change is above 0.Stable when 95% confidence interval of average annual percent change includes 0.Falling when 95% confidence interval of average annual percent change is below 0.† Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84, 85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used for SEER and NPCR incidence rates.‡ Incidence Trend data come from different sources. Due to different years of data availability, most of the trends are AAPCs based on APCs but some are APCs calculated in SEER*Stat. Please refer to the source for each area for additional information.Rates and trends are computed using different standards for malignancy. For more information see malignant.^ Late Stage is defined as cases determined to be regional or distant. Due to changes in stage coding, Combined Summary Stage (2004+) is used for data from Surveillance, Epidemiology, and End Results (SEER) databases and Merged Summary Stage is used for data from National Program of Cancer Registries databases. Due to the increased complexity with staging, other staging variables maybe used if necessary.Data Source Field Key(1) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(5) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(6) Source: National Program of Cancer Registries SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention (based on the 2022 submission).(7) Source: SEER November 2022 submission.(8) Source: Incidence data provided by the SEER Program. AAPCs are calculated by the Joinpoint Regression Program and are based on APCs. Data are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84,85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used with SEER November 2022 data.Some data are not available, see Data Not Available for combinations of geography, cancer site, age, and race/ethnicity.Data for the United States does not include data from Nevada.Data for the United States does not include Puerto Rico.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset contains 2 .csv files This file contains various demographic and health-related data for different regions. Here's a brief description of each column:
avganncount: Average number of cancer cases diagnosed annually.
avgdeathsperyear: Average number of deaths due to cancer per year.
target_deathrate: Target death rate due to cancer.
incidencerate: Incidence rate of cancer.
medincome: Median income in the region.
popest2015: Estimated population in 2015.
povertypercent: Percentage of population below the poverty line.
studypercap: Per capita number of cancer-related clinical trials conducted.
binnedinc: Binned median income.
medianage: Median age in the region.
pctprivatecoveragealone: Percentage of population covered by private health insurance alone.
pctempprivcoverage: Percentage of population covered by employee-provided private health insurance.
pctpubliccoverage: Percentage of population covered by public health insurance.
pctpubliccoveragealone: Percentage of population covered by public health insurance only.
pctwhite: Percentage of White population.
pctblack: Percentage of Black population.
pctasian: Percentage of Asian population.
pctotherrace: Percentage of population belonging to other races.
pctmarriedhouseholds: Percentage of married households. birthrate: Birth rate in the region.
This file contains demographic information about different regions, including details about household size and geographical location. Here's a description of each column:
statefips: The FIPS code representing the state.
countyfips: The FIPS code representing the county or census area within the state.
avghouseholdsize: The average household size in the region.
geography: The geographical location, typically represented as the county or census area name followed by the state name.
Each row in the file represents a specific region, providing details about household size and geographical location. This information can be used for various demographic analyses and studies.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project develops a high-resolution, multi-scale cancer dataset in the U.S. by using a multi-constraint Monte Carlo simulation method to estimate suppressed county-level cancer data and further downscale them to ZIP Code Tabulation Areas (ZCTAs). This method integrates population subgroup structures and macro-level incidence rates as constraints, ensuring consistency and reliability across spatial scales. The resulting dataset spans multiple geographic units, from state and county levels to ZCTAs, enabling comprehensive analyses of cancer burden, facilitating in-depth spatial analyses, and designing precision public health interventions across multiple scales.
This is a linked dataset between drinking water data and cancer data. Drinking Water Data: County-level concentrations of arsenic from CWSs between 2000 and 2010 were collected from the Center for Disease Control and Prevention’s (CDC) National Environmental Public Health Tracking Network (NEPHTN) (Centers for Disease Control and Prevention, 2018a). Annual mean drinking water arsenic concentrations from 2000 to 2010 were available for a total of 87,662 samples from 75,453 CWS from 26 states, representing 1,425 counties. For samples identified as non-detects, the most frequently reported values were 0.5 ppb and 1 ppb, with a range of 0 ppb to 10 ppb. For non-detect samples reported as zero, the value was substituted with a constant of 0.25 ppb (Almberg et al., 2017; Bulka et al., 2016). Of the samples that were reported as non-detects, 10.87% were reported as zeros. Cancer Data: County-level cancer counts and incidence rates for bladder, colorectal, and kidney cancers were acquired from the National Cancer Institute (NCI) and CDC’s State Cancer Profiles for 2011 through 2015 for adults (age ≥ 50) to match the counties with exposure data (National Cancer Institute and Centers for Disease Control and Prevention, 2018a). We utilized the time period 2011-2015 to provide a lag following the exposure period of 2000-2010. The State Cancer Profiles provide age-adjusted county-level cancer incidence, prevalence, mortality rates and average annual counts for 20 different types of cancers and select demographics (National Cancer Institute and Centers for Disease Control and Prevention, 2018b). Counties where there were less than 16 reported cases in a specific county, sex, and/or race category were suppressed to ensure confidentiality and stability of rate estimates (National Cancer Institute and Centers for Disease Control and Prevention, 2018a). This dataset is associated with the following publication: Krajewski, A., M. Jimenez, K. Rappazzo, D. Lobdell, and J. Jagai. Aggregated Cumulative County Arsenic in Drinking Water and Associations with Bladder, Colorectal, and Kidney Cancers, Accounting for Population Served. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 31(6): 979-989, (2021).
This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated 8/14/2024.
Definition of "All Cancer Sites": ICD-O-3 Topography (Site) Codes C00.0 – C80.9 with histology codes including all invasive cancers of all sites except basal and squamous cell skin cancers, and in situ cancer cases of the urinary bladder. Rates are per 100,000 population and are age-adjusted to 2000 U.S. standard population. Rates based on case counts of 1-15 are suppressed per DHMH/MCR Data Use Policy and Procedures.
This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated on 8/14/2024
Cancer Mortality Rate - This indicator shows the age-adjusted mortality rate from cancer (per 100,000 population). Maryland’s age adjusted cancer mortality rate is higher than the US cancer mortality rate. Cancer impacts people across all population groups, however wide racial disparities exist. https://health.maryland.gov/pophealth/Documents/SHIP/SHIP%20Lite%20Data%20Details/Cancer%20Mortality%20Rate.pdf"/> Link to Data Details
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundColorectal cancer (CRC) incidence rates have increased in younger individuals worldwide. We examined the most recent early- and late-onset CRC rates for the US.MethodsAge-standardized incidence rates (ASIR, per 100,000) of CRC were calculated using the US Cancer Statistics Database’s high-quality population-based cancer registry data from the entire US population. Results were cross-classified by age (20-49 [early-onset] and 50-74 years [late-onset]), race/ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic, American Indian/Alaskan Native, Asian/Pacific Islander), sex, anatomic location (proximal, distal, rectal), and histology (adenocarcinoma, neuroendocrine).ResultsDuring 2001 through 2018, early-onset CRC rates significantly increased among American Indians/Alaskan Natives, Hispanics, and Whites. Compared to Whites, early-onset CRC rates are now 21% higher in American Indians/Alaskan Natives and 6% higher in Blacks. Rates of early-onset colorectal neuroendocrine tumors have increased in Whites, Blacks, and Hispanics; early-onset colorectal neuroendocrine tumor rates are 2-times higher in Blacks compared to Whites. Late-onset colorectal adenocarcinoma rates are decreasing, while late-onset colorectal neuroendocrine tumor rates are increasing, in all racial/ethnic groups. Late-onset CRC rates remain 29% higher in Blacks and 15% higher in American Indians/Alaskan Natives compared to Whites. Overall, CRC incidence was higher in men than women, but incidence of early-onset distal colon cancer was higher in women.ConclusionsThe early-onset CRC disparity between Blacks and Whites has decreased, due to increasing rates in Whites—rates in Blacks have remained stable. However, rates of colorectal neuroendocrine tumors are increasing in Blacks. Blacks and American Indians/Alaskan Natives have the highest rates of both early- and late-onset CRC.ImpactOngoing prevention efforts must ensure access to and uptake of CRC screening for Blacks and American Indians/Alaskan Natives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘SHIP Cancer Mortality Rate 2009-2017’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/5aa0af7e-0ab6-492e-a6b2-65fd7ff23709 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
Cancer Mortality Rate - This indicator shows the age-adjusted mortality rate from cancer (per 100,000 population). Maryland’s age adjusted cancer mortality rate is higher than the US cancer mortality rate. Cancer impacts people across all population groups, however wide racial disparities exist.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundEarly-onset colorectal cancer (EOCRC) has an alarmingly increasing trend and arouses increasing attention. Causes of death in EOCRC population remain unclear.MethodsData of EOCRC patients (1975–2018) were extracted from the Surveillance, Epidemiology, and End Results database. Distribution of death was calculated, and death risk of each cause was compared with the general population by calculating standard mortality ratios (SMRs) at different follow-up time. Univariate and multivariate Cox regression models were utilized to identify independent prognostic factors for overall survival (OS).ResultsThe study included 36,013 patients, among whom 9,998 (27.7%) patients died of colorectal cancer (CRC) and 6,305 (17.5%) patients died of non-CRC causes. CRC death accounted for a high proportion of 74.8%–90.7% death cases within 10 years, while non-CRC death (especially cardiocerebrovascular disease death) was the major cause of death after 10 years. Non-cancer death had the highest SMR in EOCRC population within the first year after cancer diagnosis. Kidney disease [SMR = 2.10; 95% confidence interval (CI), 1.65–2.64] and infection (SMR = 1.92; 95% CI, 1.48–2.46) were two high-risk causes of death. Age at diagnosis, race, sex, year of diagnosis, grade, SEER stage, and surgery were independent prognostic factors for OS.ConclusionMost of EOCRC patients died of CRC within 10-year follow-up, while most of patients died of non-CRC causes after 10 years. Within the first year after cancer diagnosis, patients had high non-CRC death risk compared to the general population. Our findings help to guide risk monitoring and management for US EOCRC patients.
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Death rate has been age-adjusted by the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Lung cancer is a leading cause of cancer-related death in the US. People who smoke have the greatest risk of lung cancer, though lung cancer can also occur in people who have never smoked. Most cases are due to long-term tobacco smoking or exposure to secondhand tobacco smoke. Cities and communities can take an active role in curbing tobacco use and reducing lung cancer by adopting policies to regulate tobacco retail; reducing exposure to secondhand smoke in outdoor public spaces, such as parks, restaurants, or in multi-unit housing; and improving access to tobacco cessation programs and other preventive services.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains Cancer Incidence data for Prostate Cancer(All Stages^) including: Age-Adjusted Rate, Confidence Interval, Average Annual Count, and Trend field information for US States for the average 5 year span from 2016 to 2020.Data are for males segmented age (All Ages, Ages Under 50, Ages 50 & Over, Ages Under 65, and Ages 65 & Over), with field names and aliases describing the sex and age group tabulated.For more information, visit statecancerprofiles.cancer.govData NotationsState Cancer Registries may provide more current or more local data.TrendRising when 95% confidence interval of average annual percent change is above 0.Stable when 95% confidence interval of average annual percent change includes 0.Falling when 95% confidence interval of average annual percent change is below 0.† Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84, 85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used for SEER and NPCR incidence rates.‡ Incidence Trend data come from different sources. Due to different years of data availability, most of the trends are AAPCs based on APCs but some are APCs calculated in SEER*Stat. Please refer to the source for each area for additional information.Rates and trends are computed using different standards for malignancy. For more information see malignant.^ All Stages refers to any stage in the Surveillance, Epidemiology, and End Results (SEER) summary stage.Data Source Field Key(1) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(5) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(6) Source: National Program of Cancer Registries SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention (based on the 2022 submission).(7) Source: SEER November 2022 submission.(8) Source: Incidence data provided by the SEER Program. AAPCs are calculated by the Joinpoint Regression Program and are based on APCs. Data are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84,85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used with SEER November 2022 data.Some data are not available, see Data Not Available for combinations of geography, cancer site, age, and race/ethnicity.Data for the United States does not include data from Nevada.Data for the United States does not include Puerto Rico.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Cases and Deaths by Race/Ethnicity’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/3fdc6593-c708-4a6a-8073-5ca862caa279 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.
The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.
The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.
Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf
Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.
Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More infor
--- Original source retains full ownership of the source dataset ---
Death rate has been age-adjusted to the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Obesity can increase an individual’s lifetime risk of breast cancer. Promoting healthy food retail and physical activity and improving access to preventive care services are important measures that cities and communities can take to prevent breast cancer.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains Cancer Incidence data for Lung Cancer (All Stages^) including: Age-Adjusted Rate, Confidence Interval, Average Annual Count, and Trend field information for US States for the average 5 year span from 2016 to 2020.Data are segmented by sex (Both Sexes, Male, and Female) and age (All Ages, Ages Under 50, Ages 50 & Over, Ages Under 65, and Ages 65 & Over), with field names and aliases describing the sex and age group tabulated.For more information, visit statecancerprofiles.cancer.govData NotationsState Cancer Registries may provide more current or more local data.TrendRising when 95% confidence interval of average annual percent change is above 0.Stable when 95% confidence interval of average annual percent change includes 0.Falling when 95% confidence interval of average annual percent change is below 0.† Incidence rates (cases per 100,000 population per year) are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84, 85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used for SEER and NPCR incidence rates.‡ Incidence Trend data come from different sources. Due to different years of data availability, most of the trends are AAPCs based on APCs but some are APCs calculated in SEER*Stat. Please refer to the source for each area for additional information.Rates and trends are computed using different standards for malignancy. For more information see malignant.^ All Stages refers to any stage in the Surveillance, Epidemiology, and End Results (SEER) summary stage.Data Source Field Key(1) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(5) Source: National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Based on the 2022 submission.(6) Source: National Program of Cancer Registries SEER*Stat Database - United States Department of Health and Human Services, Centers for Disease Control and Prevention (based on the 2022 submission).(7) Source: SEER November 2022 submission.(8) Source: Incidence data provided by the SEER Program. AAPCs are calculated by the Joinpoint Regression Program and are based on APCs. Data are age-adjusted to the 2000 US standard population (19 age groups: <1, 1-4, 5-9, ... , 80-84,85+). Rates are for invasive cancer only (except for bladder cancer which is invasive and in situ) or unless otherwise specified. Population counts for denominators are based on Census populations as modified by NCI. The US Population Data File is used with SEER November 2022 data.Some data are not available, see Data Not Available for combinations of geography, cancer site, age, and race/ethnicity.Data for the United States does not include data from Nevada.Data for the United States does not include Puerto Rico.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundA more accurate preoperative prediction of lymph node involvement (LNI) in prostate cancer (PCa) would improve clinical treatment and follow-up strategies of this disease. We developed a predictive model based on machine learning (ML) combined with big data to achieve this.MethodsClinicopathological characteristics of 2,884 PCa patients who underwent extended pelvic lymph node dissection (ePLND) were collected from the U.S. National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2015. Eight variables were included to establish an ML model. Model performance was evaluated by the receiver operating characteristic (ROC) curves and calibration plots for predictive accuracy. Decision curve analysis (DCA) and cutoff values were obtained to estimate its clinical utility.ResultsThree hundred and forty-four (11.9%) patients were identified with LNI. The five most important factors were the Gleason score, T stage of disease, percentage of positive cores, tumor size, and prostate-specific antigen levels with 158, 137, 128, 113, and 88 points, respectively. The XGBoost (XGB) model showed the best predictive performance and had the highest net benefit when compared with the other algorithms, achieving an area under the curve of 0.883. With a 5%~20% cutoff value, the XGB model performed best in reducing omissions and avoiding overtreatment of patients when dealing with LNI. This model also had a lower false-negative rate and a higher percentage of ePLND was avoided. In addition, DCA showed it has the highest net benefit across the whole range of threshold probabilities.ConclusionsWe established an ML model based on big data for predicting LNI in PCa, and it could lead to a reduction of approximately 50% of ePLND cases. In addition, only ≤3% of patients were misdiagnosed with a cutoff value ranging from 5% to 20%. This promising study warrants further validation by using a larger prospective dataset.
https://www.icpsr.umich.edu/web/ICPSR/studies/36144/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36144/terms
These data are being released in BETA version to facilitate early access to the study for research purposes. This collection has not been fully processed by NACDA or ICPSR at this time; the original materials provided by the principal investigator were minimally processed and converted to other file types for ease of use. As the study is further processed and given enhanced features by ICPSR, users will be able to access the updated versions of the study. Please report any data errors or problems to user support and we will work with you to resolve any data related issues. The National Health Interview Survey (NHIS) is conducted annually and sponsored by the National Center for Health Statistics (NCHS), which is part of the U.S. Public Health Service. The purpose of the NHIS is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive across the United States population through the collection and analysis of data on a broad range of health topics. The redesigned NHIS questionnaire introduced in 1997 (see National Health Interview Survey, 1997 [ICPSR 2954]) consists of a core that remains largely unchanged from year to year, plus an assortment of supplements varying from year to year. The 2010 NHIS Core consists of three modules: Family, Sample Adult, and Sample Child. The datasets derived from these modules include Household Level, Family Level, Person Level, Injury/Poison Episode Level, Injury/Poison Verbatim Level, Sample Adult Level, and Sample Child level. The 2010 NHIS supplements consist of stand alone datasets for Cancer Level and Quality of Life data derived from the Sample Adult core and Disability Questions Tests 2010 Level derived from the Family core questionnaire. Additional supplementary questions can be found in the Sample Child dataset on the topics of cancer, immunization, mental health, and mental health services and in the Sample Adult dataset on the topics of epilepsy, immunization, and occupational health. Part 1, Household Level, contains data on type of living quarters, number of families in the household responding and not responding, and the month and year of the interview for each sampling unit. Parts 2-5 are based on the Family Core questionnaire. Part 2, Family Level, provides information on all family members with respect to family size, family structure, health status, limitation of daily activities, cognitive impairment, health conditions, doctor visits, hospital stays, health care access and utilization, employment, income, participation in government assistance programs, and basic demographic information. Part 3, Person Level, includes information on sex, age, race, marital status, education, family income, major activities, health status, health care costs, activity limits, and employment status. Parts 4 and 5, Injury/Poisoning Episode Level and Injury/Poisoning Verbatim Level, consist of questions about injuries and poisonings that resulted in medical consultations for any family members and contains information about the external cause and nature of the injury or poisoning episode and what the person was doing at the time of the injury or poisoning episode, in addition to the date and place of occurrence. A randomly-selected adult in each family was interviewed for Part 6, Sample Adult Level, regarding specific health issues, the relation between employment and health, health status, health care and doctor visits, limitation of daily activities, immunizations, and behaviors such as smoking, alcohol consumption, and physical activity. Demographic information, including occupation and industry, also was collected. The respondents to Part 6 also completed Part 7, Cancer Level, which consists of a set of supplemental questions about diet and nutrition, physical activity, tobacco, cancer screening, genetic testing, family history, and survivorship. Part 8, Sample Child Level, provides information from an adult in the household on medical conditions of one child in the household, such as developmental or intellectual disabilities, respiratory problems, seizures, allergies, and use of special equipment like hearing aids, braces, or wheelchairs. Parts 9 through 13 comprise the additional Supplements and Paradata for the 2010 NHIS. Part 9, Disability Questions Tests 2010 Level
Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).