83 datasets found
  1. Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siamak Tahmasbi (2025). Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/siamaktahmasbi/diabetes-health-indicators
    Explore at:
    zip(4413929 bytes)Available download formats
    Dataset updated
    Mar 7, 2025
    Authors
    Siamak Tahmasbi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.

    Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.

    The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.

    Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

    I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.

    Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

    Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.

    Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.

  2. CDC Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Jul 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). CDC Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/cdc-diabetes-health-indicators
    Explore at:
    zip(6324278 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context:

    Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.

    Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.

    As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.

    Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.

    The dataset includes three files:

    1. diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.

    2. diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).

    3. diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.

    Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?

    Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.

    Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.

  3. The association between environmental quality and diabetes in the U.S.

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). The association between environmental quality and diabetes in the U.S. [Dataset]. https://catalog.data.gov/dataset/the-association-between-environmental-quality-and-diabetes-in-the-u-s
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).

  4. U

    United States US: Diabetes Prevalence: % of Population Aged 20-79

    • ceicdata.com
    Updated Mar 15, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2009). United States US: Diabetes Prevalence: % of Population Aged 20-79 [Dataset]. https://www.ceicdata.com/en/united-states/health-statistics/us-diabetes-prevalence--of-population-aged-2079
    Explore at:
    Dataset updated
    Mar 15, 2009
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2017
    Area covered
    United States
    Description

    United States US: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 10.790 % in 2017. United States US: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 10.790 % from Dec 2017 (Median) to 2017, with 1 observations. United States US: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;

  5. Comprehensive Diabetes Clinical Dataset(100k rows)

    • kaggle.com
    zip
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Comprehensive Diabetes Clinical Dataset(100k rows) [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/100000-diabetes-clinical-dataset
    Explore at:
    zip(917848 bytes)Available download formats
    Dataset updated
    Jul 20, 2024
    Authors
    Priyam Choksi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.

    Dataset Use Cases

    This dataset can be used for various analytical and machine learning purposes, such as:

    1. Predictive Modeling: Build models to predict the likelihood of diabetes based on demographic and health-related features.
    2. Health Analytics: Analyze the correlation between different health metrics (e.g., BMI, HbA1c level) and diabetes.
    3. Demographic Studies: Examine the distribution of diabetes across different demographic groups and locations.
    4. Public Health Research: Identify risk factors for diabetes and target interventions to high-risk groups.
    5. Clinical Research: Study the relationship between comorbid conditions like hypertension and heart disease with diabetes.

    Potential Analyses

    • Descriptive Statistics: Summarize the dataset to understand the central tendencies and dispersion of features.
    • Correlation Analysis: Identify the relationships between features.
    • Classification Models: Use machine learning algorithms to classify individuals as diabetic or non-diabetic.
    • Trend Analysis: Analyze trends over the years to see how diabetes prevalence has changed.
  6. c

    Diabetes mellitus (in persons aged 17 and over): England

    • data.catchmentbasedapproach.org
    Updated Apr 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Rivers Trust (2021). Diabetes mellitus (in persons aged 17 and over): England [Dataset]. https://data.catchmentbasedapproach.org/datasets/diabetes-mellitus-in-persons-aged-17-and-over-england
    Explore at:
    Dataset updated
    Apr 7, 2021
    Dataset authored and provided by
    The Rivers Trust
    Area covered
    Description

    SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.

  7. Diabetes control is associated with environmental quality in the U.S.

    • catalog.data.gov
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Diabetes control is associated with environmental quality in the U.S. [Dataset]. https://catalog.data.gov/dataset/diabetes-control-is-associated-with-environmental-quality-in-the-u-s
    Explore at:
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    United States
    Description

    Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).

  8. d

    Diabetes

    • catalog.data.gov
    • data.wprdc.org
    • +2more
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2023). Diabetes [Dataset]. https://catalog.data.gov/dataset/diabetes
    Explore at:
    Dataset updated
    Mar 14, 2023
    Dataset provided by
    Allegheny County
    Description

    These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.

  9. Diagnosed Diabetes Prevalence Among Adults by Colorado Census Tract

    • trac-cdphe.opendata.arcgis.com
    Updated Feb 8, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colorado Department of Public Health and Environment (2016). Diagnosed Diabetes Prevalence Among Adults by Colorado Census Tract [Dataset]. https://trac-cdphe.opendata.arcgis.com/items/9567507342654bb1bf8cfaaa3b498b3f
    Explore at:
    Dataset updated
    Feb 8, 2016
    Dataset authored and provided by
    Colorado Department of Public Health and Environmenthttps://cdphe.colorado.gov/
    Area covered
    Description

    Purpose:This dataset contains the Colorado census tract level Diagnosed Diabetes Prevalence Among Adults (2022) copied directly from the CDC Places Dataset. A multi-level regression and post-stratification approach was applied to BRFSS and ACS data to compute a detailed probability among adults who report being told by a doctor or other health professional that they have diabetes (other than diabetes during pregnancy for female respondents). The probability was then applied to the detailed population estimates at the appropriate geographic level to generate the prevalence. The 95% confidence interval was derived using Monte Carlo simulation. Update Schedule and URL: This dataset is updated annually (September) as the CDC PLACES dataset is updated.Fields Description:GEOID: 11-digit Census Tract FIPS Identifier COUNTY: County NameNAME: Census Tract NameDIABETES_ADJRATE: Diagnosed Diabetes Prevalence Among Adults (2022, CDC Places)DIABETES_L95CI: Diagnosed Diabetes Lower 95% Confidence IntervalDIABETES_U95CI: Diagnosed Diabetes Upper 95% Confidence Interval

  10. m

    Pediatric Diabetes Data

    • mass.gov
    Updated Mar 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Population Health Information Tool (2019). Pediatric Diabetes Data [Dataset]. https://www.mass.gov/info-details/pediatric-diabetes-data
    Explore at:
    Dataset updated
    Mar 26, 2019
    Dataset provided by
    Population Health Information Tool
    Department of Public Health
    Bureau of Climate and Environmental Health
    Area covered
    Massachusetts
    Description

    Find data on pediatric diabetes in Massachusetts. This dataset contains information on the number of cases and prevalence of Type 1 and Type 2 diabetes among students, grades K-8, in Massachusetts.

  11. f

    Type 1 diabetes incidence rates in individuals aged 0–14 and 0–19 years by...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apoorva Gomber; Zachary J. Ward; Carlo Ross; Maira Owais; Carol Mita; Jennifer M. Yeh; Ché L. Reddy; Rifat Atun (2023). Type 1 diabetes incidence rates in individuals aged 0–14 and 0–19 years by WHO regions and income. [Dataset]. http://doi.org/10.1371/journal.pgph.0001099.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Apoorva Gomber; Zachary J. Ward; Carlo Ross; Maira Owais; Carol Mita; Jennifer M. Yeh; Ché L. Reddy; Rifat Atun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *(Age-standardised incidence rates per 100,000 individuals per year with 95% confidence intervals. † For cells labeled as NA, 95% CIs could not be estimated as there was only 1 data point).

  12. Diabetes in Adults - CDPHE Community Level Estimates (Census Tracts)

    • data-cdphe.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated May 12, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colorado Department of Public Health and Environment (2016). Diabetes in Adults - CDPHE Community Level Estimates (Census Tracts) [Dataset]. https://data-cdphe.opendata.arcgis.com/datasets/diabetes-in-adults-cdphe-community-level-estimates-census-tracts
    Explore at:
    Dataset updated
    May 12, 2016
    Dataset authored and provided by
    Colorado Department of Public Health and Environmenthttps://cdphe.colorado.gov/
    Area covered
    Description

    These data represent the predicted (modeled) prevalence of Diabetes among adults (Age 18+) for each census tract in Colorado. Diabetes is defined as ever being diagnosed with Diabetes by a doctor, nurse, or other health professional, and this definition does not include gestational, borderline, or pre-diabetes.The estimate for each census tract represents an average that was derived from multiple years of Colorado Behavioral Risk Factor Surveillance System data (2014-2017).CDPHE used a model-based approach to measure the relationship between age, race, gender, poverty, education, location and health conditions or risk behavior indicators and applied this relationship to predict the number of persons' who have the health conditions or risk behavior for each census tract in Colorado. We then applied these probabilities, based on demographic stratification, to the 2013-2017 American Community Survey population estimates and determined the percentage of adults with the health conditions or risk behavior for each census tract in Colorado.The estimates are based on statistical models and are not direct survey estimates. Using the best available data, CDPHE was able to model census tract estimates based on demographic data and background knowledge about the distribution of specific health conditions and risk behaviors.The estimates are displayed in both the map and data table using point estimate values for each census tract and displayed using a Quintile range. The high and low value for each color on the map is calculated based on dividing the total number of census tracts in Colorado (1249) into five groups based on the total range of estimates for all Colorado census tracts. Each Quintile range represents roughly 20% of the census tracts in Colorado. No estimates are provided for census tracts with a known population of less than 50. These census tracts are displayed in the map as "No Est, Pop < 50."No estimates are provided for 7 census tracts with a known population of less than 50 or for the 2 census tracts that exclusively contain a federal correctional institution as 100% of their population. These 9 census tracts are displayed in the map as "No Estimate."

  13. Diabetes - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jul 12, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2017). Diabetes - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/diabetes
    Explore at:
    Dataset updated
    Jul 12, 2017
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    This public health factsheet describes facts, assets, and strategies related to diabetes in Camden.

  14. Healthcare Diabetes Dataset

    • kaggle.com
    zip
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Healthcare Diabetes Dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes
    Explore at:
    zip(27316 bytes)Available download formats
    Dataset updated
    Aug 23, 2023
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.

    Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.

    Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.

    Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.

    Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.

  15. Z

    Mitchelstown Cohort Data Set

    • data.niaid.nih.gov
    Updated Jun 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flynn, Sinead; Millar, Sean (2021). Mitchelstown Cohort Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4927358
    Explore at:
    Dataset updated
    Jun 21, 2021
    Dataset provided by
    HRB Centre for Health and Diet Research, School of Public Health, University College Cork, Ireland
    Authors
    Flynn, Sinead; Millar, Sean
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mitchelstown
    Description

    SPSS dataset.

    The Cork and Kerry Diabetes and Heart Disease Study (Phase II – Mitchelstown Cohort) was a single centre study conducted between 2010 and 2011. A random sample was recruited from a large primary care centre in Mitchelstown, County Cork, Ireland. The Livinghealth Clinic serves a population of approximately 20,000 Caucasian-European subjects, with a mix of urban and rural residents. Stratified sampling was employed to recruit equal numbers of men and women from all registered attending patients in the 46–73-year age group. In total, 3,807 potential participants were selected from the practice list. Following the exclusion of duplicates, deaths and subjects incapable of consenting or attending appointment, 3,051 were invited to participate in the study and of these, 2,047 (49% male) completed the questionnaire and physical examination components of the baseline assessment (response rate: 67%). Individuals with pre-existing cardiovascular disease or T2DM were not excluded from the cohort.

  16. a

    Medication Non-Adherence in Type 2 Diabetes: Prevalence and Correlates in a...

    • afrischolarrepository.net.ng
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Medication Non-Adherence in Type 2 Diabetes: Prevalence and Correlates in a Tertiary Healthcare Facility in Southeast Nigeria - Dataset - Afrischolar Discovery Initiative (ADI) [Dataset]. https://afrischolarrepository.net.ng/dataset/medication-non-adherence-in-type
    Explore at:
    Dataset updated
    Apr 2, 2024
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Area covered
    Nigeria
    Description

    Background: Non adherence to diabetes medications leads to frequent relapses, poor treatment outcome, reduced quality of life and significant increases in healthcare cost in a resource poor country and a healthcare system already overburdened by infectious illnesses and other diseases. This study verified the adherence of people with type2 diabetes mellitus and factors associated with it. Objective: This study was carried out to assess the prevalence of non-adherence to medication, and identify factors associated with it in patients with type 2 diabetes mellitus. Study Design: This was a cross-sectional study conducted on a sample of one hundred and twenty three out-patients, aged over 18 years and diagnosed with type 2 diabetes mellitus and who have been on oral medications for at least a year prior to study entry. Socio-demographic and clinical variables were collected and compared between participants with optimal and suboptimal adherence. Results: The mean ages of participants were 59.68±11.8 and mean duration of illness 7.22 About one-in-four (28%) were poor adherers to their diabetes medications. Variables with significant association with non-adherence include marital status (x2= 8.73, df= 1, p= 0.01), educational level (x2= 6.96, df= f, p= 0.01), employment status (x2= 4.89, df= 1, p= 0.030), duration of illness (x2= 3.07, df= 1, p= 0.08) and patients’ living arrangement (x2= 4.28, df= 1, p= 0.04). In multivariate analysis, predictors of poor adherence were: lack of treatment supervision (OR 0.032, p-value< 0.001), poor attitude to medication (OR 0.015, p< 0.001) Conclusion: Medication non-adherence in patients with type 2

  17. Emergency Hospital Admissions for Diabetes - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jul 11, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2017). Emergency Hospital Admissions for Diabetes - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/emergency-hospital-admissions-for-diabetes
    Explore at:
    Dataset updated
    Jul 11, 2017
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Note: This dataset has been archived as of January 2024 after confirmation from NHS Digital that the source dataset is no longer being updated, and there is not a replacement publication for the diabetic ketoacidosis admissions data. This indicator is one measure of the prevention, identification and management of people at risk of developing diabetes and those with the condition. It shows adverse outcomes as annual numbers of emergency hospital admissions for diabetic ketoacidosis and coma. Emergency admissions to hospital can be avoided by identifying people at risk, primary care services interventions, encouraging better diet and exercise, improving self-monitoring and diabetes control and supporting patients and carers in the management of diabetes in the home. It needs local health and care services working effectively together to support people’s health and independence in the community. Type 2 diabetes (around 90 percent of diabetes diagnoses) is partially preventable - it can be prevented or delayed by lifestyle changes (exercise, weight loss, healthy eating). Earlier detection of type 2 diabetes followed by effective treatment reduces the risk of developing diabetic complications. These include cardiovascular, kidney, foot and eye diseases, meaning considerable illness and reduced quality of life. There are some limitations to this data, as raw counts of hospital episodes are subject to population structures (such as numbers of people in older age groups) and other underlying variations. Counts below 5 are removed from the data. The data is updated annually. Sources: NHS Digital (now part of NHS England) - dataset P02177, and commentary from the Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF) indicator 2.17 Recorded Diabetes.

  18. h

    Brecon Dataset (BREC)

    • web.prod.hdruk.cloud
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ["Brecon Group"], Brecon Dataset (BREC) [Dataset]. https://web.prod.hdruk.cloud/dataset/326
    Explore at:
    unknownAvailable download formats
    Dataset authored and provided by
    ["Brecon Group"]
    License

    https://saildatabank.com/data/apply-to-work-with-the-data/https://saildatabank.com/data/apply-to-work-with-the-data/

    Description

    A register of children diagnosed with type 1 diabetes in Wales, collected from Paediatric diabetes clinics in Wales. Maintained by the Brecon Group. Two capture-recapture studies have been done showing >97% completeness for type 1 diabetes diagnoses in Wales. Data has been collected since 1995 and is complete since then, but some people diagnosed earlier are also included.

  19. h

    A NIHR Midlands PSRC dataset of older patients with diabetic emergencies

    • healthdatagateway.org
    unknown
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2025). A NIHR Midlands PSRC dataset of older patients with diabetic emergencies [Dataset]. https://healthdatagateway.org/dataset/1109
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Up to 30% of older adults (aged 65 years and older) have been diagnosed with Diabetes mellitus. Older diabetics are more likely to experience complications like heart disease, stroke, kidney problems, and nerve damage, leading to higher hospital admission rates.  When managing older diabetic patients, healthcare professionals need to consider factors like frailty, polypharmacy (multiple medications), and potential cognitive impairments. 

    This dataset includes 83,303 people and 366,035 spells, designed to support research which improves diabetic emergency and unplanned care in older adults. It includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to acute care process, presenting complaints, admissions, microbiology results, referrals, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations and others), all blood results (urea, albumin, platelets, white blood cells and others). Includes all prescribed & administered treatments and all outcomes. Linked images are also available (radiographs, CT scans, MRI).

    Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size

  20. Table_1_Diabetes websites lack information on dietary causes, risk factors,...

    • frontiersin.figshare.com
    • figshare.com
    docx
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa T. Crummett; Muhammad H. Aslam (2023). Table_1_Diabetes websites lack information on dietary causes, risk factors, and preventions for type 2 diabetes.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1159024.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Lisa T. Crummett; Muhammad H. Aslam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionType 2 diabetes (T2D) is a growing public health burden throughout the world. Many people looking for information on how to prevent T2D will search on diabetes websites. Multiple dietary factors have a significant association with T2D risk, such as high intake of added sugars, refined carbohydrates, saturated fat, and red meat or processed meat; and decreased intake of dietary fiber, and fruits/vegetables. Despite this dietary information being available in the scientific literature, it is unclear whether this information is available in gray literature (websites).ObjectiveIn this study, we evaluate the use of specific terms from diabetes websites that are significantly associated with causes/risk factors and preventions for T2D from three term categories: (A) dietary factors, (B) nondietary nongenetic (lifestyle-associated) factors, and (C) genetic (non-modifiable) factors. We also evaluate the effect of website type (business, government, nonprofit) on term usage among websites.MethodsWe used web scraping and coding tools to quantify the use of specific terms from 73 diabetes websites. To determine the effect of term category and website type on the usage of specific terms among 73 websites, a repeated measures general linear model was performed.ResultsWe found that dietary risk factors that are significantly associated with T2D (e.g., sugar, processed carbohydrates, dietary fat, fruits/vegetables, fiber, processed meat/red meat) were mentioned in significantly fewer websites than either nondietary nongenetic factors (e.g., obesity, physical activity, dyslipidemia, blood pressure) or genetic factors (age, family history, ethnicity). Among websites that provided “eat healthy” guidance, one third provided zero dietary factors associated with type 2 diabetes, and only 30% provided more than two specific dietary factors associates with type 2 diabetes. We also observed that mean percent usage of all terms associated with T2D causes/risk factors and preventions was significantly lower among government websites compared to business websites and nonprofit websites.ConclusionDiabetes websites need to increase their usage of dietary factors when discussing causes/risk factors and preventions for T2D; as dietary factors are modifiable and strongly associated with all nondietary nongenetic risk factors, in addition to T2D risk.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Siamak Tahmasbi (2025). Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/siamaktahmasbi/diabetes-health-indicators
Organization logo

Diabetes Health Indicators

433,323 survey responses from 2023 BRFSS Data

Explore at:
zip(4413929 bytes)Available download formats
Dataset updated
Mar 7, 2025
Authors
Siamak Tahmasbi
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.

Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.

The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.

Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.

Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.

Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.

Search
Clear search
Close search
Google apps
Main menu