Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
The PRAIM study (PRospective multicenter observational study of an integrated AI system with live Monitoring) assessed the impact of an AI-based decision support software on breast cancer screening outcomes. This Dryad data package contains the anonymized data from 461 818 screening cases across 12 screening sites in Germany. Variables include screening outcomes like cancer detection, use of AI software, radiologist assessments, cancer characteristics, and further metadata. The data can be used to reproduce the analyses on performance of AI-supported breast cancer screening versus standard of care published in Nature Medicine: Nationwide real-world implementation of AI for cancer detection in population-based mammography screening., , , # Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM) – Dataset
The PRAIM study (PRospective multicenter observational study of an integrated Artificial Intelligence system with live Monitoring) was a study conducted within the German breast cancer screening program from July 2021 to February 2023 to assess the impact of an AI-based decision support software. This dataset contains the data from PRAIM.
The PRAIM study has been published in Nature Medicine. Please refer to the article Nationwide real-world implementation of AI for cancer detection in population-based mammography screening for further information on study design, results, and discussion of impact. The study has been previously registered in the German Clinical Trials Register and the study protocol can be found on the [website of the Univ...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Synthetic Colorectal Cancer Global Dataset is a fully anonymised, high-dimensional synthetic dataset designed for global cancer research, predictive modelling, and educational use. It encompasses demographic, clinical, lifestyle, genetic, and healthcare access factors relevant to colorectal cancer incidence, outcomes, and survivability.
https://storage.googleapis.com/opendatabay_public/ae2aba99-491d-45a1-a99e-7be14927f4af/299af3fa2502_patient_analysis_plots.png" alt="Synthetic Colorectal Cancer Global Data Distribution.png">
This dataset can be used for:
The dataset includes 100% synthetic yet clinically plausible records from diverse countries and demographic groups. It is anonymized and modeled to reflect real-world variability in risk factors, diagnosis stages, treatment, and survival without compromising patient privacy.
CC0 (Public Domain)
https://digital.nhs.uk/services/data-access-request-service-darshttps://digital.nhs.uk/services/data-access-request-service-dars
The National Cancer Registration and Analysis Service (NCRAS) at Public Health England supplies cancer registration data to NHS Digital. This data is available to be linked to other data held by NHS Digital in order to provide notifications on an individual's cancer status, be available to support research studies and to identify potential research participants for clinical trials.
NCRAS is the population-based cancer registry for England. It collects, quality assures and analyses data on all people living in England who are diagnosed with malignant and pre-malignant neoplasms, with national coverage since 1971.
The Cancer Registration dataset comprises England data to the present day, and Welsh data up to April 2017.
Timescales for dissemination of agreed data can be found under 'Our Service Levels' at the following link: https://digital.nhs.uk/services/data-access-request-service-dars/data-access-request-service-dars-process Standard response
These data contain the results of GC-MS, LC-MS and immunochemistry analyses of mask sample extracts. The data include tentatively identified compounds through library searches and compound abundance. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The data can not be accessed. Format: The dataset contains the identification of compounds found in the mask samples as well as the abundance of those compounds for individuals who participated in the trial.
This dataset is associated with the following publication: Pleil, J., M. Wallace, J. McCord, M. Madden, J. Sobus, and G. Ferguson. How do cancer-sniffing dogs sort biological samples? Exploring case-control samples with non-targeted LC-Orbitrap, GC-MS, and immunochemistry methods. Journal of Breath Research. Institute of Physics Publishing, Bristol, UK, 14(1): 016006, (2019).
Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air _domain includes 87 variables representing criteria and hazardous air pollutants. The water _domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land _domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built _domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundDespite a wealth of real-world data on metastatic breast cancer (mBC), insights into the lived experience are lacking. This study aimed to explore how the lived experience of mBC is described on social media.MethodsA predefined search string identified posts relevant to the lived experience of mBC from Twitter, patient forums, and blogs across 14 European countries. The final data set was analyzed using content analysis.ResultsA total of 76,456 conversations were identified between November 1, 2018, and November 30, 2020. Twitter was the most commonly used social media platform across all 76,456 conversations from the raw data set (n = 61,165; 80%). Automated and manual relevancy checks followed by a final random sampling filter identified 820 conversations for content analysis. The majority of data from the raw data set was generated from the United Kingdom (n = 31,346; 41%). From this final data set, 61% of posts were authored by patients, 15% by friends and/or family members of patients, and 14% by caregivers. A total of 686 conversations described the patient journey (n = 686/820; 84%); 64% of these (n = 439) concerned breast cancer treatment, with approximately 40% of discussions regarding diagnosis and tests (n = 274/686) and less than 20% of discussions surrounding disease management (n = 123/686; 18%). Key themes relating to a lack of effective treatment, prolonged survival and associated quality of life, debilitating consequences of side effects, and the social impacts of living with mBC were identified.ConclusionsThe findings from this study provided an insight into the lived experience of mBC. While retrospective data collection inherently limits the amount of demographic or clinical information that can be obtained from the population sample, social media listening studies offer training to healthcare professionals in communication, the importance of quality of life, organization of healthcare, and even the design of clinical trials. As new targeted therapies are gradually incorporated into clinical practice, innovative technologies, such as social media listening, have the potential to support regulatory procedures and drug toxicity monitoring, as well as provide the patient voice in the regulation of new and existing medicines.
The MDC was started in the early 1990s as a screening survey in the middle-aged population of Malmö, the third largest city of Sweden. 28000 subjects living in Malmö were during 1991-1996 invited by letter and through public advertisement to a clinical examination whitch included blood sampling and a questionnaire about nutrition. 62 % of the participants were women. Cardiovascular risk factors were measured in a random subsample (n= 6000). After 16 years, during 2007-2012, a new clinical examination and blood sampling was performed (n=3700). Morbidity and mortality have been followed up by national registers.
Purpose:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionChronic infection with hepatitis C virus (HCV) is an established risk factor for liver cancer. Although several epidemiologic studies have evaluated the risk of extrahepatic malignancies among people living with HCV, due to various study limitations, results have been heterogeneous.MethodsWe used data from the British Columbia Hepatitis Testers Cohort (BC-HTC), which includes all individuals tested for HCV in the Province since 1990. We assessed hepatic and extrahepatic cancer incidence using data from BC Cancer Registry. Standardized incidence ratios (SIR) comparing to the general population of BC were calculated for each cancer site from 1990 to 2016.ResultsIn total, 56,823 and 1,207,357 individuals tested positive and negative for HCV, respectively. Median age at cancer diagnosis among people with and without HCV infection was 59 (interquartile range (IQR): 53-65) and 63 years (IQR: 54-74), respectively. As compared to people living without HCV, a greater proportion of people living with HCV-infection were men (66.7% vs. 44.7%, P-value
Sister Study is a prospective cohort of 50,884 U.S. women aged 35 to 74 years old conducted by the NIEHS. Eligible participants are women without a history of breast cancer but with at least one sister diagnosed with breast cancer at enrollment during 2003 - 2009. Datasets used in this research effort include health outcomes, lifestyle factors, socioeconomic factors, medication history, and built and natural environment factors. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Contact NIEHS Sister Study (https://sisterstudy.niehs.nih.gov/English/index1.htm) for data access. Format: Datasets are provided in SAS and/or CSV format.
computer-assisted telephone interview (CATI); mail questionnaireThe data available for download are not weighted and users will need to weight the data prior to analysis. Users who plan to do inferential statistical testing using the data should utilize a statistical program that can incorporate the replicate weights included in the dataset. Additional information about sampling, interviewing, sampling error, weighting, and the universe of each question may be found in the codebook.This data collection utilized a split frame where approximately half of the sample completed the survey by telephone through random digit dial (RDD) and half completed it through the mail as a paper and pencil questionnaire. Users can analyse the data with only the RDD respondents, only the mail respondents, or both, as indicated by the variable SAMPFLAG. For each type of analysis, users will need to supply the proper final weight to get population estimates and replicate weights to calculate the correct variance.Variable names containing more than 16 characters were truncated in order to be compatible with current statistical programs. Therefore, variable names may differ slightly from those in the original documentation.The formats of the weight and replicate weight variables were adjusted to fit the width of the values present in these variables, and the variables REGION and DIVISION were converted from character to numeric.To protect respondent confidentiality, open-ended responses containing information on respondent's occupation in variables HC03WHERESEE2_OS and HD05OCCUPATIO_OS were blanked.ICPSR created a unique sequential record identifier variable named CASEID. The Health Information National Trends Survey (HINTS) collects nationally representative data about the American public's access to and use of cancer-related information. The 2007 HINTS survey is the third in an ongoing biannual series and provides information on the changing patterns, needs, and behavior in seeking and supplying cancer information and explores how cancer risks are perceived. Respondents were asked about the ways in which they obtained health information, their use of health care services, their views about medical information and research, and their beliefs about cancer. A series of questions specifically addressed cervical cancer, colon cancer, and the Human Papillomavirus (HPV). Information was also collected on physical and mental health status, diet, physical activity, sun exposure, history of cancer, tobacco use, and whether respondents had health insurance. Demographic variables include sex, age, race, education level, employment status, marital status, household income, number of people living in the household, ownership of residence, and whether respondents were born in the United States. For the CATI data collection, the sample design was a list-assisted RDD sample and one adult in the household was sampled for the extended interview using an algorithm designed to minimize intrusiveness. The mail survey included a stratified sample selected from a list of addresses that oversampled for minorities. Sampled addresses were matched to a database of listed telephone numbers, with 50 percent of the cases successfully matched to a telephone number. Matches in which a telephone number was both appended to an address-sample address and included in the RDD sample were deleted from the address sample. Please refer to the codebook documentation for more information on sample design. Every sampled adult who completed a questionnaire in HINTS 2007 received three full-sample weights and three sets of replicate-sample weights. Two of the three types of weights correspond to the type of samples - the address-sample weight (MWGT0) and the RDD sample weight (RWGT0). The address-sample weight is missing for a case in the RDD sample and vice versa. The sample-specific weights are used to calculate estimates based on data from one of the two samples. The third type of weight is a composite weight (CWGT0) which is used to calculate estimates based on the data from both samples. Please refer to the codebook documentation for more information on weighting. Response Rates: The overall response rate for the RDD sample was 24.23 percent, while the overall response rate for the address-sample was 30.99 percent. Please refer to the codebook documentation for more information on response rates. The civilian, noninstitutionalized population of the United States aged 18 years and older. Datasets: DS1: Health Information National Trends Survey (HINTS), 2007
https://www.icpsr.umich.edu/web/ICPSR/studies/36144/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36144/terms
These data are being released in BETA version to facilitate early access to the study for research purposes. This collection has not been fully processed by NACDA or ICPSR at this time; the original materials provided by the principal investigator were minimally processed and converted to other file types for ease of use. As the study is further processed and given enhanced features by ICPSR, users will be able to access the updated versions of the study. Please report any data errors or problems to user support and we will work with you to resolve any data related issues. The National Health Interview Survey (NHIS) is conducted annually and sponsored by the National Center for Health Statistics (NCHS), which is part of the U.S. Public Health Service. The purpose of the NHIS is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive across the United States population through the collection and analysis of data on a broad range of health topics. The redesigned NHIS questionnaire introduced in 1997 (see National Health Interview Survey, 1997 [ICPSR 2954]) consists of a core that remains largely unchanged from year to year, plus an assortment of supplements varying from year to year. The 2010 NHIS Core consists of three modules: Family, Sample Adult, and Sample Child. The datasets derived from these modules include Household Level, Family Level, Person Level, Injury/Poison Episode Level, Injury/Poison Verbatim Level, Sample Adult Level, and Sample Child level. The 2010 NHIS supplements consist of stand alone datasets for Cancer Level and Quality of Life data derived from the Sample Adult core and Disability Questions Tests 2010 Level derived from the Family core questionnaire. Additional supplementary questions can be found in the Sample Child dataset on the topics of cancer, immunization, mental health, and mental health services and in the Sample Adult dataset on the topics of epilepsy, immunization, and occupational health. Part 1, Household Level, contains data on type of living quarters, number of families in the household responding and not responding, and the month and year of the interview for each sampling unit. Parts 2-5 are based on the Family Core questionnaire. Part 2, Family Level, provides information on all family members with respect to family size, family structure, health status, limitation of daily activities, cognitive impairment, health conditions, doctor visits, hospital stays, health care access and utilization, employment, income, participation in government assistance programs, and basic demographic information. Part 3, Person Level, includes information on sex, age, race, marital status, education, family income, major activities, health status, health care costs, activity limits, and employment status. Parts 4 and 5, Injury/Poisoning Episode Level and Injury/Poisoning Verbatim Level, consist of questions about injuries and poisonings that resulted in medical consultations for any family members and contains information about the external cause and nature of the injury or poisoning episode and what the person was doing at the time of the injury or poisoning episode, in addition to the date and place of occurrence. A randomly-selected adult in each family was interviewed for Part 6, Sample Adult Level, regarding specific health issues, the relation between employment and health, health status, health care and doctor visits, limitation of daily activities, immunizations, and behaviors such as smoking, alcohol consumption, and physical activity. Demographic information, including occupation and industry, also was collected. The respondents to Part 6 also completed Part 7, Cancer Level, which consists of a set of supplemental questions about diet and nutrition, physical activity, tobacco, cancer screening, genetic testing, family history, and survivorship. Part 8, Sample Child Level, provides information from an adult in the household on medical conditions of one child in the household, such as developmental or intellectual disabilities, respiratory problems, seizures, allergies, and use of special equipment like hearing aids, braces, or wheelchairs. Parts 9 through 13 comprise the additional Supplements and Paradata for the 2010 NHIS. Part 9, Disability Questions Tests 2010 Level
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results are contained in 5 datasets, each of which is described in further detail below. Each dataset is stored as a tar file, which when extracted will be in Hive/Feather format as: {dataset}/age={age}/genotype={genotype}/data.feather Where {age} is the starting age of the simulated individuals (30, 35, 40 or 45) and {genotype} is the genotype of the simulated individuals (path_MLH1, path_MSH2, path_MSH6, path_PMS2). Each data.feather file covers 500 different parameter sets, and for each of these 4 competing options, and for each of these 1000 simulated individuals, i.e., 2 million simulated individuals per data.feather and 32 million simulated individuals overall. patient-level-outcomes This gives patient-level costs, life years and QALYs for conducting the economic evaluation. Each row/observation corresponds to a single simulated patient. Fields:
params_uuid - uniquely identifies the parameter set (can be used to join with other datasets) individual_uuid - uniquely identifies the individual (can be used to join with other datasets) competing_option - text description for which of the four competing options applied to the individual total_costs - total costs over the lifetime for the individual, discounted at the rate params'analysis.discount_rate.cost' total_life_years - total life years for the individual, discounted at the rate params'analysis.discount_rate.ly' total_qalys - total quality-adjusted life years for the individual, discounted at the rate params'analysis.discount_rate.qaly'
Tip: As each observation corresponds to a single individual it is possible to calculate undiscounted life years lived from total_life_years: drc = log(1 + params['analysis.discount_rate.ly'])total_life_years_undiscounted = - log(1 - drc * total_life_years) / drc params The parameters for the simulations. Each row/observation corresponds to a single parameter set.
params_uuid - uniquely identifies the parameter set (can be used to join with other datasets) ... - parameter values, which are mostly scalars but some are vectors
cancer-outcomes Counts of various cancer outcomes. Each row gives the count of a particular cancer outcome for a particular combination of parameter set and competing option. CAUTION: Any rows which would have n=0 have been omitted from this dataset.
params_uuid - as above competing_option - as above site - colorectal, ovarian or endometrial outcome - Incidence, Recurrence or Mortality stage - Stage of cancer at time of diagnosis: I, II, III or IV (missing for mortality) route - Route to cancer diagnosis (only available if outcome is 'Incidence'): RouteToDiagnosis.SYMPTOMATIC_PRESENTATION, RouteToDiagnosis.SURVEILLANCE, or RouteToDiagnosis.RISK_REDUCING_SURGERY n - Number of times the corresponding cancer outcome occurred (for a simulated population of 1000 individuals)
cancer-free-survival For each individual, how long did they survive without a cancer diagnosis or becoming censored (principal reason for censoring is death from non-cancer cause)
individual_uuid - see above params_uuid - see above competing_option - see above age_event - age of the individual when the diagnosis event or censoring happened event - 1 if a cancer diagnosis happened or 0 if censoring happened cancer - CancerSite.ENDOMETRIUM, CancerSite.OVARIES or CancerSite.COLORECTUM stage - see cancer-outcomes route - see cancer-outcomes age_enter - age of the individual when entering the model (=age) sex - will be Sex.FEMALE for all simulated individuals
cancer-survival Each row/observation in this dataset corresponds to survival from a diagnosed cancer in a simulated individual. CAUTION: For each cancer there are two rows, because users may wish to calculate cause-specific survival or crude survival. Ensure that you filter out the calculation type you do not wish to include
individual_uuid - see above params_uuid - see above competing_option - see above survival_type - 'cause-specific' or 'all-cause' (crude) age_event - age at which the individual died or was censored age_diagnosis - age at which the individual was diagnosed with this cancer event - 1 if the individual had an eligible death at age_event (determined by survival_type), 0 otherwise (e.g., died from another cause if survival_type is 'cause_specific', censored) site - see above stage - see above route - see above
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The proportion of women eligible for screening who have had a test with a recorded result at least once in the previous 36 months.RationaleBreast screening supports early detection of cancer and is estimated to save 1,400 lives in England each year. This indicator provides an opportunity to incentivise screening promotion and other local initiatives to increase coverage of breast screening.Improvements in coverage would mean more breast cancers are detected at earlier, more treatable stages.Breast screening supports early detection of cancer and is estimated to save 1,400 lives in England each year. This indicator provides an opportunity to incentivise screening promotion and other local initiatives to increase coverage of breast screening.Improvements in coverage would mean more breast cancers are detected at earlier, more treatable stages.Definition of numeratorTested women (numerator) is the number of eligible women aged 53 to 70 registered with a GP with a screening test result recorded in the past 36 months.Definition of denominatorEligible women (denominator) is the number of women aged 53 to 70 years resident in the area (determined by postcode of residence) who are eligible for breast screening at a given point in time, excluding those whose recall has been ceased for clinical reasons (for example, due to previous bilateral mastectomy).CaveatsData for ICBs are estimated from local authority data. In most cases ICBs are coterminous with local authorities, so the ICB figures are precise. In cases where local authorities cross ICB boundaries, the local authority data are proportionally split between ICBs, based on population located in each ICB.The affected ICBs are:Bath and North East Somerset, Swindon and Wiltshire;Bedfordshire, Luton and Milton Keynes;Buckinghamshire, Oxfordshire and Berkshire West;Cambridgeshire and Peterborough;Frimley;Hampshire and Isle of Wight;Hertfordshire and West Essex;Humber and North Yorkshire;Lancashire and South Cumbria;Norfolk and Waveney;North East and North Cumbria;Suffolk and North East Essex;Surrey Heartlands;Sussex;West Yorkshire.Please be aware that the April 2019 to March 2020, April 2020 to March 2021 and April 2021 to March 2022 data covers the time period affected by the COVID19 pandemic and therefore data for this period should be interpreted with caution.This indicator gives screening coverage by local authority . This is not the same as the indicator based on population registered with primary care organisations which include patients wherever they live. This is likely to result in different England totals depending on selected (registered or resident) population footprint.The indicator excludes women outside the target age range for the screening programme who may self refer for screening.Standards say "Women who are ineligible for screening due to having had a bilateral mastectomy, women who are ceased from the programme based on a ‘best interests’ decision under the Mental Capacity Act 2005 or women who make an informed choice to remove themselves from the screening programme will be removed from the numerator and denominator.There are a number of categories of women in the eligible age range who are not registered with a GP and subsequently not called for screening as they are not on the Breast Screening Select (BS Select) database. Screening units have a responsibility to maximise coverage of eligible women in their target population and should therefore be accessible to women in this category through self referral and GP referral ."This indicator gives screening coverage by local authority . This is not the same as the indicator based on population registered with primary care organisations which include patients wherever they live. This is likely to result in different England totals depending on selected (registered or resident) population footprint.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of obesity, inactivity and inactivity/obesity-related illnesses. Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.The analysis incorporates data relating to the following:Obesity/inactivity-related illnesses (asthma, cancer, chronic kidney disease, coronary heart disease, depression, diabetes mellitus, hypertension, stroke and transient ischaemic attack)Excess weight in children and obesity in adults (combined)Inactivity in children and adults (combined)The analysis was designed with the intention that this dataset could be used to identify locations where investment could encourage greater levels of activity. In particular, it is hoped the dataset will be used to identify locations where the creation or improvement of accessible green/blue spaces and public engagement programmes could encourage greater levels of outdoor activity within the target population, and reduce the health issues associated with obesity and inactivity.ANALYSIS METHODOLOGY1. Obesity/inactivity-related illnessesThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to:- Asthma (in persons of all ages)- Cancer (in persons of all ages)- Chronic kidney disease (in adults aged 18+)- Coronary heart disease (in persons of all ages)- Depression (in adults aged 18+)- Diabetes mellitus (in persons aged 17+)- Hypertension (in persons of all ages)- Stroke and transient ischaemic attack (in persons of all ages)This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.For each of the above illnesses, the percentage of each MSOA’s population with that illness was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of patients registered with each GP that have that illness The estimated percentage of each MSOA’s population with each illness was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with each illness, within the relevant age range.For each illness, each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have that illnessB) the NUMBER of people within that MSOA who are estimated to have that illnessAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA predicted to have that illness, compared to other MSOAs. In other words, those are areas where a large number of people are predicted to suffer from an illness, and where those people make up a large percentage of the population, indicating there is a real issue with that illness within the population and the investment of resources to address that issue could have the greatest benefits.The scores for each of the 8 illnesses were added together then converted to a relative score between 1 – 0 (1 = worst, 0 = best), to give an overall score for each MSOA: a score close to 1 would indicate that an area has high predicted levels of all obesity/inactivity-related illnesses, and these are areas where the local population could benefit the most from interventions to address those illnesses. A score close to 0 would indicate very low predicted levels of obesity/inactivity-related illnesses and therefore interventions might not be required.2. Excess weight in children and obesity in adults (combined)For each MSOA, the number and percentage of children in Reception and Year 6 with excess weight was combined with population data (up to age 17) to estimate the total number of children with excess weight.The first part of the analysis detailed in section 1 was used to estimate the number of adults with obesity in each MSOA, based on GP-level statistics.The percentage of each MSOA’s adult population (aged 18+) with obesity was estimated, using GP-level data (see section 1 above). This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of adult patients registered with each GP that are obeseThe estimated percentage of each MSOA’s adult population with obesity was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of adults in each MSOA with obesity.The estimated number of children with excess weight and adults with obesity were combined with population data, to give the total number and percentage of the population with excess weight.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have excess weight/obesityB) the NUMBER of people within that MSOA who are estimated to have excess weight/obesityAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA predicted to have excess weight/obesity, compared to other MSOAs. In other words, those are areas where a large number of people are predicted to suffer from excess weight/obesity, and where those people make up a large percentage of the population, indicating there is a real issue with that excess weight/obesity within the population and the investment of resources to address that issue could have the greatest benefits.3. Inactivity in children and adultsFor each administrative district, the number of children and adults who are inactive was combined with population data to estimate the total number and percentage of the population that are inactive.Each district was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that district who are estimated to be inactiveB) the NUMBER of people within that district who are estimated to be inactiveAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the district predicted to be inactive, compared to other districts. In other words, those are areas where a large number of people are predicted to be inactive, and where those people make up a large percentage of the population, indicating there is a real issue with that inactivity within the population and the investment of resources to address that issue could have the greatest benefits.Summary datasetAn average of the scores calculated in sections 1-3 was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer the score to 1, the greater the number and percentage of people suffering from obesity, inactivity and associated illnesses. I.e. these are areas where there are a large number of people (both children and adults) who are obese, inactive and suffer from obesity/inactivity-related illnesses, and where those people make up a large percentage of the local population. These are the locations where interventions could have the greatest health and wellbeing benefits for the local population.LIMITATIONS1. For data recorded at the GP practice level, data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Levels of obesity, inactivity and associated illnesses: Summary (England). Areas with data missing’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children, we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of
The study aim was to describe the roles and health issues of older people (50 years and older) who have offspring who are infected or deceased due to HIV, or who have HIV themselves. In addition the effects of the introduction of HIV treatment on the lives and wellbeing of people aged 50 and above was investigated. Specifically, the aims of the study were to describe the effects on physical and mental health, household income and social situation as well as the tasks and responsibilities of older people infected and/or affected by HIV.
Rural subdistrict Hlabisa, Kwa-Zulu Natal Province, South Africa
individuals
Hlabisa, Africa Centre, Health and Demographic Surveillance Site fifty plus population
Sample survey data [ssd]
The sample was stratified into five groups. Group 1 was older people on HIV treatment for 1 year or more in 2010 at the time of Wave I of the project. Group 2 was older people who were not on HIV treatment or on treatment for 3 months or less in 2010 (Wave I). Group 3 was older people who had an adult (14-49 years) offspring in the household who was HIV-infected in 2010 (Wave 1). Group 4 was older people who had experienced an HIV-related death of an adult household member in 2010 (Wave 1). Group 5 was older people who were not on HIV treatment or were on treatment for 3 months or less in 2013 (at the time of Wave II). There was over sampling of participants in groups 2 and 5. A two-stage sampling process was adopted for participants in groups 1, 2 and 5. At stage one, all persons meeting the respective criteria for each group were identified from the Hlabisa treatment programme. At stage two, 100 participants for each group who are also under surveillance were randomly selected. The study is restricted to persons aged 50 and above and to those living in the Africa Centre surveillance area. The sample is representative of HIV-infected and HIV-affected older persons in the study population. Respondents who were absent, not found or refused were replaced with another randomly selected respondent meeting the same inclusion criteria. Sampling frame used was the Hlabisa HIV care and Treatment database (ARTeMIS) and the Africa Centre Longitudinal surveillance system. Participants in groups 1,2 and 5 were first identified from ARTeMIS then all those under surveillance and the specific criteria for each group were randomly selected and approached for participation.
Face-to-face [f2f]
The questionnaires for the Well-Being of Older People Study (WOPS) were based on the World Health Organization's Study on Global Ageing and Adult Health (SAGE) questionnaires, with some modifications and additions to suit the local environment. The questionnaires were also partially harmonized with a similar sub-study in Uganda. The study instrument has three main components: (1) detailed questionnaire on basic demographic information, description of health state including functional ability assessment, well-being, health problems and symptoms, health care utilisation, care giving and care receiving, and experiences of living with HIV (2) collection of anthropometry data (3) blood sample for laboratory measured health risk biomarkers
Data editing and quality control was conducted at three levels. 1. During field work the professional nurses cross checked their forms for incomplete or missing information. 2. The two co-principal investigators checked each form for completeness and quality of data. 3. Data entry constraints were built into the data entry programme to spot errors and inconsistencies. Any errors identified at any of these stages were referred back to the professional nurses who revisited the participant for data correction.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The proportion of women in the resident population eligible for cervical screening aged 25 to 49 years at end of period reported who were screened adequately within the previous 3.5 years.RationaleCervical screening supports detection of cell abnormalities that may become cancer and is estimated to save 4,500 lives in England each year. Inclusion of this indicator provides an opportunity to incentivise screening promotion and other local initiatives to increase coverage of cervical cancer screening.Improvements in coverage would mean more cervical cancer is prevented or detected at earlier, more treatable stages.Definition of numeratorTested women (numerator) is the number of eligible women with a technically adequate screen within the previous 3.5 years.Definition of denominatorEligible women (denominator) is the number of women aged 25 to 49 years resident in the area (determined by postcode of residence) who are eligible for cervical screening at a given point in time, excluding those without a cervix.CaveatsData for ICBs are estimated from local authority data. In most cases ICBs are coterminous with local authorities, so the ICB figures are precise. In cases where local authorities cross ICB boundaries, the local authority data are proportionally split between ICBs, based on population located in each ICB.The affected ICBs are:Bath and North East Somerset, Swindon and Wiltshire;Bedfordshire, Luton and Milton Keynes;Buckinghamshire, Oxfordshire and Berkshire West;Cambridgeshire and Peterborough;Frimley;Hampshire and Isle of Wight;Hertfordshire and West Essex;Humber and North Yorkshire;Lancashire and South Cumbria;Norfolk and Waveney;North East and North Cumbria;Suffolk and North East Essex;Surrey Heartlands;Sussex;West Yorkshire.Please be aware that the April 2019 to March 2020, April 2020 to March 2021 and April 2021 to March 2022 data covers the time period affected by the COVID19 pandemic and therefore data for this period should be interpreted with caution.This indicator gives screening coverage by local authority of residence. This is not the same as the indicator based on population registered with primary care organisations which include patients wherever they live. This is likely to result in different England totals depending on selected (registered or resident) population footprint.The indicator excludes women outside the target age range for the screening programme who may self refer for screening.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objectives: To examine trends in strong opioid prescribing in a primary care population in Wales and identify if factors such as age, deprivation and recorded diagnosis of depression or anxiety may have influenced any changes noted.
Design: Trend, cross-sectional and longitudinal analyses of routine data from the Primary Care General Practice database and accessed via the Secure Anonymised Information Linkage (SAIL) databank. Setting: A total of 345 Primary Care practices in Wales.
Participants: Anonymised records of 1,223,503 people aged 18 or over, receiving at least one opioid pre- scription between 1 January 2005 and 31 December 2015 were analysed. People with a cancer diagnosis (10.1%) were excluded from the detailed analysis.
Results: During the study period, 26,180,200 opioid prescriptions were issued to 1,223,503 individuals (55.9% female, 89.9% non-cancer diagnoses). The greatest increase in annual prescribing was in the 18–24 age group (10,470%), from 0.08 to 8.3 prescriptions/1000 population, although the 85+ age group had the highest prescribing rates across the study period (from 149.9 to 288.5 prescriptions/1000 popu- lation). The number of people with recorded diagnoses of depression or anxiety and prescribed strong opioids increased from 1.2 to 5.1 people/1000 population (328%). The increase was 366.9% in areas of highest deprivation compared to 310.3 in the least. Areas of greatest deprivation had more than twice the rate of strong opioid prescribing than the least deprived areas of Wales.
Conclusion: The study highlights a large increase in strong opioid prescribing for non-cancer pain, in Wales between 2005 and 2015. Population groups of interest include the youngest and oldest adult age groups and people with depression or anxiety particularly if living in the most deprived communities. Based on this evidence, development of a Welsh national guidance on safe and rational prescribing of opioids in chronic pain would be advisable to prevent further escalation of these medicines.
Methods Data extracted from the Secure Anonymised Information Linkage databank (SAIL). SQL code used to extract annualised totals for each subset of data.
Excel and SPSS25 used to analyse data using descriptive statistical methods.
Excel used to produce trend graphs and totals.
Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).