81 datasets found
  1. Prevalence of diabetes among U.S. adults from 2013 to 2021, by detailed...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Prevalence of diabetes among U.S. adults from 2013 to 2021, by detailed ethnicity [Dataset]. https://www.statista.com/statistics/1451900/prevalence-of-diabetes-among-us-adults-by-detailed-race-ethnicity/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    From 2013 to 2021, it was estimated that among different racial/ethnic groups of adults in the United States, Samoans presented the highest prevalence of diabetes, with 20.3 percent diagnosed with diabetes. This statistic depicts the prevalence of diabetes among adults in the United States from 2013 to 2021, by detailed race and ethnicity.

  2. m

    Massachusetts Diabetes Data

    • mass.gov
    Updated Oct 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2016). Massachusetts Diabetes Data [Dataset]. https://www.mass.gov/info-details/massachusetts-diabetes-data
    Explore at:
    Dataset updated
    Oct 15, 2016
    Dataset provided by
    Department of Public Health
    Bureau of Community Health and Prevention
    Area covered
    Massachusetts
    Description

    Diabetes prevalence in Massachusetts has been steadily increasing.

  3. Diabetes death rate for children and adolescents U.S. 2012-2014, by...

    • statista.com
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Diabetes death rate for children and adolescents U.S. 2012-2014, by ethnicity [Dataset]. https://www.statista.com/statistics/719994/diabetes-death-rate-children-and-adolescents-us-by-ethnicity/
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2012 - 2014
    Area covered
    United States
    Description

    This statistic displays the death rate from diabetes for children and adolescents in the United States from 2012 to 2014, by ethnicity. According to the statistic, the death rate from diabetes among children and adolescents was highest among black youth.

  4. England: distribution of people with diabetes 2023/24, by ethnicity

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). England: distribution of people with diabetes 2023/24, by ethnicity [Dataset]. https://www.statista.com/statistics/387359/individuals-with-diabetes-by-ethnicity-in-england/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    England
    Description

    Between 2023 and 2024, over ******* percent of all those registered with type 2 diabetes in England were Asian or Asian British. This statistic displays the share of individuals registered with diabetes in England in 2023/24, by ethnicity.

  5. Table_1_Prediabetes prevalence and awareness by race, ethnicity, and...

    • frontiersin.figshare.com
    docx
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taynara Formagini; Joanna Veazey Brooks; Andrew Roberts; Kai McKeever Bullard; Yan Zhang; Ryan Saelee; Matthew James O'Brien (2023). Table_1_Prediabetes prevalence and awareness by race, ethnicity, and educational attainment among U.S. adults.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1277657.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Taynara Formagini; Joanna Veazey Brooks; Andrew Roberts; Kai McKeever Bullard; Yan Zhang; Ryan Saelee; Matthew James O'Brien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionRacial and ethnic minority groups and individuals with limited educational attainment experience a disproportionate burden of diabetes. Prediabetes represents a high-risk state for developing type 2 diabetes, but most adults with prediabetes are unaware of having the condition. Uncovering whether racial, ethnic, or educational disparities also occur in the prediabetes stage could help inform strategies to support health equity in preventing type 2 diabetes and its complications. We examined the prevalence of prediabetes and prediabetes awareness, with corresponding prevalence ratios according to race, ethnicity, and educational attainment.MethodsThis study was a pooled cross-sectional analysis of the National Health and Nutrition Examination Survey data from 2011 to March 2020. The final sample comprised 10,262 U.S. adults who self-reported being Asian, Black, Hispanic, or White. Prediabetes was defined using hemoglobin A1c and fasting plasma glucose values. Those with prediabetes were classified as “aware” or “unaware” based on survey responses. We calculated prevalence ratios (PR) to assess the relationship between race, ethnicity, and educational attainment with prediabetes and prediabetes awareness, controlling for sociodemographic, health and healthcare-related, and clinical characteristics.ResultsIn fully adjusted logistic regression models, Asian, Black, and Hispanic adults had a statistically significant higher risk of prediabetes than White adults (PR:1.26 [1.18,1.35], PR:1.17 [1.08,1.25], and PR:1.10 [1.02,1.19], respectively). Adults completing less than high school and high school had a significantly higher risk of prediabetes compared to those with a college degree (PR:1.14 [1.02,1.26] and PR:1.12 [1.01,1.23], respectively). We also found that Black and Hispanic adults had higher rates of prediabetes awareness in the fully adjusted model than White adults (PR:1.27 [1.07,1.50] and PR:1.33 [1.02,1.72], respectively). The rates of prediabetes awareness were consistently lower among those with less than a high school education relative to individuals who completed college (fully-adjusted model PR:0.66 [0.47,0.92]).DiscussionDisparities in prediabetes among racial and ethnic minority groups and adults with low educational attainment suggest challenges and opportunities for promoting health equity in high-risk groups and expanding awareness of prediabetes in the United States.

  6. Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siamak Tahmasbi (2025). Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/siamaktahmasbi/diabetes-health-indicators
    Explore at:
    zip(4413929 bytes)Available download formats
    Dataset updated
    Mar 7, 2025
    Authors
    Siamak Tahmasbi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.

    Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.

    The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.

    Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

    I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.

    Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

    Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.

    Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.

  7. f

    Table_1_Clinical performance and health equity implications of the American...

    • figshare.com
    docx
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew J. O’Brien; Yan Zhang; Stacy C. Bailey; Sadiya S. Khan; Ronald T. Ackermann; Mohammed K. Ali; Michael E. Bowen; Stephen R. Benoit; Giuseppina Imperatore; Christopher S. Holliday; Kai McKeever Bullard (2023). Table_1_Clinical performance and health equity implications of the American Diabetes Association’s 2023 screening recommendation for prediabetes and diabetes.docx [Dataset]. http://doi.org/10.3389/fendo.2023.1279348.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 13, 2023
    Dataset provided by
    Frontiers
    Authors
    Matthew J. O’Brien; Yan Zhang; Stacy C. Bailey; Sadiya S. Khan; Ronald T. Ackermann; Mohammed K. Ali; Michael E. Bowen; Stephen R. Benoit; Giuseppina Imperatore; Christopher S. Holliday; Kai McKeever Bullard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe American Diabetes Association (ADA) recommends screening for prediabetes and diabetes (dysglycemia) starting at age 35, or younger than 35 years among adults with overweight or obesity and other risk factors. Diabetes risk differs by sex, race, and ethnicity, but performance of the recommendation in these sociodemographic subgroups is unknown.MethodsNationally representative data from the National Health and Nutrition Examination Surveys (2015-March 2020) were analyzed from 5,287 nonpregnant US adults without diagnosed diabetes. Screening eligibility was based on age, measured body mass index, and the presence of diabetes risk factors. Dysglycemia was defined by fasting plasma glucose ≥100mg/dL (≥5.6 mmol/L) or haemoglobin A1c ≥5.7% (≥39mmol/mol). The sensitivity, specificity, and predictive values of the ADA screening criteria were examined by sex, race, and ethnicity.ResultsAn estimated 83.1% (95% CI=81.2-84.7) of US adults were eligible for screening according to the 2023 ADA recommendation. Overall, ADA’s screening criteria exhibited high sensitivity [95.0% (95% CI=92.7-96.6)] and low specificity [27.1% (95% CI=24.5-29.9)], which did not differ by race or ethnicity. Sensitivity was higher among women [97.8% (95% CI=96.6-98.6)] than men [92.4% (95% CI=88.3-95.1)]. Racial and ethnic differences in sensitivity and specificity among men were statistically significant (P=0.04 and P=0.02, respectively). Among women, guideline performance did not differ by race and ethnicity.DiscussionThe ADA screening criteria exhibited high sensitivity for all groups and was marginally higher in women than men. Racial and ethnic differences in guideline performance among men were small and unlikely to have a significant impact on health equity. Future research could examine adoption of this recommendation in practice and examine its effects on treatment and clinical outcomes by sex, race, and ethnicity.

  8. DataSheet1_Racial/ethnic and gender disparity in the severity of NAFLD among...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magda Shaheen; Katrina M. Schrode; Marielle Tedlos; Deyu Pan; Sonia M. Najjar; Theodore C. Friedman (2023). DataSheet1_Racial/ethnic and gender disparity in the severity of NAFLD among people with diabetes or prediabetes.docx [Dataset]. http://doi.org/10.3389/fphys.2023.1076730.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Magda Shaheen; Katrina M. Schrode; Marielle Tedlos; Deyu Pan; Sonia M. Najjar; Theodore C. Friedman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aim: Non-alcoholic fatty liver disease (NAFLD) exhibits a racial disparity. We examined the prevalence and the association between race, gender, and NAFLD among prediabetes and diabetes populations among adults in the United States.Methods: We analyzed data for 3,190 individuals ≥18 years old from the National Health and Nutrition Examination Survey (NHANES) 2017–2018. NAFLD was diagnosed by FibroScan® using controlled attenuation parameter (CAP) values: S0 (none) < 238, S1 (mild) = 238–259, S2 (moderate) = 260–290, S3 (severe) > 290. Data were analyzed using Chi-square test and multinomial logistic regression, adjusting for confounding variables and considering the design and sample weights.Results: Of the 3,190 subjects, the prevalence of NAFLD was 82.6%, 56.4%, and 30.5% (p < 0.0001) among diabetes, prediabetes and normoglycemia populations respectively. Mexican American males with prediabetes or diabetes had the highest prevalence of severe NAFLD relative to other racial/ethnic groups (p < 0.05). In the adjusted model, among the total, prediabetes, and diabetes populations, a one unit increase in HbA1c was associated with higher odds of severe NAFLD [adjusted odds ratio (AOR) = 1.8, 95% confidence level (CI) = 1.4–2.3, p < 0.0001; AOR = 2.2, 95% CI = 1.1–4.4, p = 0.033; and AOR = 1.5, 95% CI = 1.1–1.9, p = 0.003 respectively].Conclusion: We found that prediabetes and diabetes populations had a high prevalence and higher odds of NAFLD relative to the normoglycemic population and HbA1c is an independent predictor of NAFLD severity in prediabetes and diabetes populations. Healthcare providers should screen prediabetes and diabetes populations for early detection of NAFLD and initiate treatments including lifestyle modification to prevent the progression to non-alcoholic steatohepatitis or liver cancer.

  9. f

    Adult Arabs have higher risk for diabetes mellitus than Jews in Israel

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anat Jaffe; Shmuel Giveon; Liat Wulffhart; Bernice Oberman; Maslama Baidousi; Arnona Ziv; Ofra Kalter-Leibovici (2023). Adult Arabs have higher risk for diabetes mellitus than Jews in Israel [Dataset]. http://doi.org/10.1371/journal.pone.0176661
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anat Jaffe; Shmuel Giveon; Liat Wulffhart; Bernice Oberman; Maslama Baidousi; Arnona Ziv; Ofra Kalter-Leibovici
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Israel
    Description

    ObjectiveDiabetes mellitus is an emerging epidemic in the Arab world. Although high diabetes prevalence is documented in Israeli Arabs, information from cohort studies is scant.MethodsThis is a population study, based on information derived between 2007–2011, from the electronic database of the largest health fund in Israel, among Arabs and Jews. Prevalence, 4-year-incidence and diabetes hazard ratios [HRs], adjusted for sex and the metabolic-syndrome [MetS]-components, were determined in 3 age groups (

  10. Comprehensive Diabetes Clinical Dataset(100k rows)

    • kaggle.com
    zip
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Comprehensive Diabetes Clinical Dataset(100k rows) [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/100000-diabetes-clinical-dataset
    Explore at:
    zip(917848 bytes)Available download formats
    Dataset updated
    Jul 20, 2024
    Authors
    Priyam Choksi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.

    Dataset Use Cases

    This dataset can be used for various analytical and machine learning purposes, such as:

    1. Predictive Modeling: Build models to predict the likelihood of diabetes based on demographic and health-related features.
    2. Health Analytics: Analyze the correlation between different health metrics (e.g., BMI, HbA1c level) and diabetes.
    3. Demographic Studies: Examine the distribution of diabetes across different demographic groups and locations.
    4. Public Health Research: Identify risk factors for diabetes and target interventions to high-risk groups.
    5. Clinical Research: Study the relationship between comorbid conditions like hypertension and heart disease with diabetes.

    Potential Analyses

    • Descriptive Statistics: Summarize the dataset to understand the central tendencies and dispersion of features.
    • Correlation Analysis: Identify the relationships between features.
    • Classification Models: Use machine learning algorithms to classify individuals as diabetic or non-diabetic.
    • Trend Analysis: Analyze trends over the years to see how diabetes prevalence has changed.
  11. Adult obesity rates in the U.S. by race/ethnicity 2023

    • statista.com
    Updated Oct 28, 2003
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2003). Adult obesity rates in the U.S. by race/ethnicity 2023 [Dataset]. https://www.statista.com/statistics/207436/overweight-and-obesity-rates-for-adults-by-ethnicity/
    Explore at:
    Dataset updated
    Oct 28, 2003
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, Black adults had the highest obesity rates of any race or ethnicity in the United States, followed by American Indians/Alaska Natives and Hispanics. As of that time, around ** percent of all Black adults were obese. Asians/Pacific Islanders had by far the lowest obesity rates. Obesity in the United States Obesity is a present and growing problem in the United States. An astonishing ** percent of the adult population in the U.S. is now considered obese. Obesity rates can vary substantially by state, with around ** percent of the adult population in West Virginia reportedly obese, compared to ** percent of adults in Colorado. The states with the highest rates of obesity include West Virginia, Mississippi, and Arkansas. Diabetes Being overweight and obese can lead to a number of health problems, including heart disease, cancer, and diabetes. Being overweight or obese is one of the most common causes of type 2 diabetes, a condition in which the body does not use insulin properly, causing blood sugar levels to rise. It is estimated that just over ***** percent of adults in the U.S. have been diagnosed with diabetes. Diabetes is now the seventh leading cause of death in the United States, accounting for ***** percent of all deaths.

  12. CDC Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Jul 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). CDC Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/cdc-diabetes-health-indicators
    Explore at:
    zip(6324278 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context:

    Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.

    Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.

    As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.

    Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.

    The dataset includes three files:

    1. diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.

    2. diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).

    3. diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.

    Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?

    Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.

    Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.

  13. Share of adults in the United States with prediabetes 2017-2020, by race

    • statista.com
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of adults in the United States with prediabetes 2017-2020, by race [Dataset]. https://www.statista.com/statistics/1382840/percentage-adults-with-prediabetes-us-by-race-ethnicity/
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2017 - Mar 2020
    Area covered
    United States
    Description

    Based on data from January 2017 to March 2020, it was estimated that around ** percent of non-Hispanic white adults in the United States had prediabetes. Those with prediabetes have blood sugar levels higher than normal, but not high enough to yet be diagnosed with diabetes. This statistic shows the percentage of adults in the United States with prediabetes from 2017 to 2020, by race/ethnicity.

  14. Number of adults in the United States with prediabetes in 2021, by...

    • statista.com
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of adults in the United States with prediabetes in 2021, by race/ethnicity [Dataset]. https://www.statista.com/statistics/1382825/number-adults-with-prediabetes-us-by-ethnicity/
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    United States
    Description

    In 2021, it was estimated that a total of **** million adults in the United States had prediabetes, with non-Hispanic white adults accounting for around **** million of these cases. Those with prediabetes have blood sugar levels higher than normal, but not high enough to yet be diagnosed with diabetes. This statistic shows the number of adults in the United States with prediabetes in 2021, by race and ethnicity.

  15. Diabetes by Demographies

    • kaggle.com
    zip
    Updated Aug 23, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gokagglers (2017). Diabetes by Demographies [Dataset]. https://www.kaggle.com/loveall/diabetes-by-demographies
    Explore at:
    zip(991 bytes)Available download formats
    Dataset updated
    Aug 23, 2017
    Authors
    Gokagglers
    Description

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  16. f

    Prevalence of screen-detected type 2 diabetes, impaired glucose regulation...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Feb 19, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gray, Laura J.; Davies, Melanie J.; Webb, David R.; Weston, Claire L.; Morris, Danielle H.; Khunti, Kamlesh (2013). Prevalence of screen-detected type 2 diabetes, impaired glucose regulation (IGR), high cardiovascular disease (CVD) risk and chronic kidney disease (CKD) by sex and ethnicity. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001684112
    Explore at:
    Dataset updated
    Feb 19, 2013
    Authors
    Gray, Laura J.; Davies, Melanie J.; Webb, David R.; Weston, Claire L.; Morris, Danielle H.; Khunti, Kamlesh
    Description

    Data shown are number of cases/total and percentage (95% confidence interval).Missing values: Diabetes = 9; IGR = 7; CVD risk = 116; CKD = 37; Any = 48; All = 11.aP-values were estimated using X2 tests and show the difference in prevalence between White Europeans and South Asians for each sex.bHigh CVD risk was defined as a risk score greater than 20%.cAny risk factor means that the person has at least one of diabetes, IGR, high CVD risk or CKD. All rick factors means that the person has diabetes or IGR, high CVD risk and CKD.

  17. Descriptive characteristics of FQHCs, 2014–2019.

    • plos.figshare.com
    xls
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjay Kishore; Sandeep P. Kishore; Cheryl Clark; Benjamin D. Sommers (2024). Descriptive characteristics of FQHCs, 2014–2019. [Dataset]. http://doi.org/10.1371/journal.pone.0310523.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sanjay Kishore; Sandeep P. Kishore; Cheryl Clark; Benjamin D. Sommers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ImportanceRacial and ethnic disparities in chronic disease are a major public health priority.ObjectiveTo determine if the amount of federal grant funding to federally-qualified health centers (FQHCs) was associated with baseline overall prevalence of uncontrolled hypertension and uncontrolled diabetes, as well as prevalence by racial and ethnic subgroup.DesignCross-sectional multivariate regression analysis of Uniform Data System 2014–2019, which includes clinic-level data from each FQHC regarding demographics, chronic disease control by race and ethnicity, and grant funding.ExposuresOur main exposure were the average values of the prevalence of uncontrolled hypertension and uncontrolled diabetes among the overall population and by racial and ethnic group from 2014–2016.Main outcomesAverage federal grant funding per patient from 2017–2019, as measured by annual health center funding from the Bureau of Primary Health Care (BPHC) and overall federal grant funding.ResultsWe analyzed 1,205 FQHCs from 2014–2019; the average BPHC grant per patient across all FQHCs in 2019 was $168 while the average total federal grant was $184 per patient. Increasing shares of total patients with uncontrolled hypertension or uncontrolled diabetes were not associated with increased total federal grant funding in either unadjusted or adjusted analysis. Increased shares of patients who are American Indian or Alaskan Native (AI-AN) with uncontrolled hypertension and diabetes were associated with increasing total federal grant funding in both unadjusted and adjusted analysis (adjusted beta hypertension $168.3, p

  18. Share of diabetics with diabetic retinopathy in the U.S. in 2021, by stage...

    • statista.com
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Share of diabetics with diabetic retinopathy in the U.S. in 2021, by stage and race [Dataset]. https://www.statista.com/statistics/1417826/share-of-diabetics-with-diabetic-retinopathy-in-us-by-stage-and-race/
    Explore at:
    Dataset updated
    Jun 13, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    United States
    Description

    In 2021, among diabetics in the United States, Black, non-Hispanic individuals were most likely to report having diabetic retinopathy compared to other races.

    This statistic depicts the percentage of people with diabetes who had diabetic retinopathy in the United States as of 2021, by stage of disease and race and ethnicity.

  19. Share of children with type 1 and 2 diabetes in England and Wales, by...

    • statista.com
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Share of children with type 1 and 2 diabetes in England and Wales, by ethnicity [Dataset]. https://www.statista.com/statistics/540417/children-with-type-1-and-2-diabetes-by-ethnicity-in-england-and-wales/
    Explore at:
    Dataset updated
    Oct 30, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2022 - Mar 2023
    Area covered
    Wales, England
    Description

    Between 2022 and 2023, there were 32,276 young people with type 1 diabetes and 1,245 with type 2 across England and Wales. The most affected were the ones with white ethnicity. This statistic shows the share of young people under the age of 24, with type 1 and 2 diabetes in England and Wales from 2022 to 2023, by ethnicity.

  20. Diabetes Health Indicators Dataset

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohan Krishna Thalla (2025). Diabetes Health Indicators Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/13128284
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohan Krishna Thalla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Diabetes Health Indicators Dataset

    Overview

    This dataset contains 100,000 patient records designed for diabetes risk prediction, analysis, and machine learning applications. The dataset is clean, preprocessed, and ready for use in classification, regression, feature engineering, statistical analysis, and data visualization.

    • Rows: 100,000
    • Columns: 35+
    • File: diabetes_dataset.csv

    Dataset Description

    The dataset includes patient profiles with features based on demographics, lifestyle habits, family history, and clinical measurements that are well-established indicators of diabetes risk. All data is generated using statistical distributions inspired by real-world medical research, ensuring privacy preservation while reflecting realistic health patterns.

    Features

    ColumnTypeDescriptionValues/Range
    patient_idIntegerUnique patient identifier1–100000
    ageIntegerAge of patient in years18–90
    genderStringPatient gender'Male', 'Female', 'Other'
    ethnicityStringEthnic background'White', 'Hispanic', 'Black', 'Asian', 'Other'
    education_levelStringHighest completed education'No formal', 'Highschool', 'Graduate', 'Postgraduate'
    income_levelStringIncome category'Low', 'Medium', 'High'
    employment_statusStringEmployment type'Employed', 'Unemployed', 'Retired', 'Student'
    smoking_statusStringSmoking behavior'Never', 'Former', 'Current'
    alcohol_consumption_per_weekFloatDrinks consumed per week0–30
    physical_activity_minutes_per_weekIntegerPhysical activity (weekly minutes)0–600
    diet_scoreIntegerDiet quality (higher = healthier)0–10
    sleep_hours_per_dayFloatAverage daily sleep hours3–12
    screen_time_hours_per_dayFloatAverage daily screen time hours0–12
    family_history_diabetesIntegerFamily history of diabetes0 = No, 1 = Yes
    hypertension_historyIntegerHypertension history0 = No, 1 = Yes
    cardiovascular_historyIntegerCardiovascular history0 = No, 1 = Yes
    bmiFloatBody Mass Index (kg/m²)15–45
    waist_to_hip_ratioFloatWaist-to-hip ratio0.7–1.2
    systolic_bpIntegerSystolic blood pressure (mmHg)90–180
    diastolic_bpIntegerDiastolic blood pressure (mmHg)60–120
    heart_rateIntegerResting heart rate (bpm)50–120
    cholesterol_totalFloatTotal cholesterol (mg/dL)120–300
    hdl_cholesterolFloatHDL cholesterol (mg/dL)20–100
    ldl_cholesterolFloatLDL cholesterol (mg/dL)50–200
    triglyceridesFloatTriglycerides (mg/dL)50–500
    glucose_fastingFloatFasting glucose (mg/dL)70–250
    glucose_postprandialFloatPost-meal glucose (mg/dL)90–350
    insulin_levelFloatBlood insulin level (µU/mL)2–50
    hba1cFloatHbA1c (%)4–14
    diabetes_risk_scoreIntegerRisk score (calculated, 0–100)0–100
    diabetes_stageStringStage of diabetes'No Diabetes', 'Pre-Diabetes', 'Type 1', 'Type 2', 'Gestational'
    diagnosed_diabetesIntegerTarget: Diabetes diagnosis0 = No, 1 = Yes

    Data Quality

    • Complete: No missing values or duplicates
    • Clean: All values fall within medically realistic ranges
    • Balanced Features: Distribution matches realistic population health patterns
    • Target Distribution: ~20–25% diagnosed cases (balanced for ML classification)

    Use Cases

    • 🩺 Binary Classification → Predict diagnosed_diabetes (Yes/No)
    • 🧮 Multiclass Classification → Predict diabetes_stage
    • 📊 Regression → Predict glucose_fasting, hba1c, or diabetes_risk_score
    • 🔍 EDA & Visualization → Explore lifestyle and clinical health patterns
    • 🧠 Machine Learning → Train ML/DL models for healthcare prediction tasks
    • 📈 Statistical Testing → Hypothesis testing on health indicators
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, Prevalence of diabetes among U.S. adults from 2013 to 2021, by detailed ethnicity [Dataset]. https://www.statista.com/statistics/1451900/prevalence-of-diabetes-among-us-adults-by-detailed-race-ethnicity/
Organization logo

Prevalence of diabetes among U.S. adults from 2013 to 2021, by detailed ethnicity

Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description

From 2013 to 2021, it was estimated that among different racial/ethnic groups of adults in the United States, Samoans presented the highest prevalence of diabetes, with 20.3 percent diagnosed with diabetes. This statistic depicts the prevalence of diabetes among adults in the United States from 2013 to 2021, by detailed race and ethnicity.

Search
Clear search
Close search
Google apps
Main menu