Facebook
TwitterFrom 2013 to 2021, it was estimated that among different racial/ethnic groups of adults in the United States, Samoans presented the highest prevalence of diabetes, with 20.3 percent diagnosed with diabetes. This statistic depicts the prevalence of diabetes among adults in the United States from 2013 to 2021, by detailed race and ethnicity.
Facebook
TwitterDiabetes prevalence in Massachusetts has been steadily increasing.
Facebook
TwitterThis statistic displays the death rate from diabetes for children and adolescents in the United States from 2012 to 2014, by ethnicity. According to the statistic, the death rate from diabetes among children and adolescents was highest among black youth.
Facebook
TwitterBetween 2023 and 2024, over ******* percent of all those registered with type 2 diabetes in England were Asian or Asian British. This statistic displays the share of individuals registered with diabetes in England in 2023/24, by ethnicity.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionRacial and ethnic minority groups and individuals with limited educational attainment experience a disproportionate burden of diabetes. Prediabetes represents a high-risk state for developing type 2 diabetes, but most adults with prediabetes are unaware of having the condition. Uncovering whether racial, ethnic, or educational disparities also occur in the prediabetes stage could help inform strategies to support health equity in preventing type 2 diabetes and its complications. We examined the prevalence of prediabetes and prediabetes awareness, with corresponding prevalence ratios according to race, ethnicity, and educational attainment.MethodsThis study was a pooled cross-sectional analysis of the National Health and Nutrition Examination Survey data from 2011 to March 2020. The final sample comprised 10,262 U.S. adults who self-reported being Asian, Black, Hispanic, or White. Prediabetes was defined using hemoglobin A1c and fasting plasma glucose values. Those with prediabetes were classified as “aware” or “unaware” based on survey responses. We calculated prevalence ratios (PR) to assess the relationship between race, ethnicity, and educational attainment with prediabetes and prediabetes awareness, controlling for sociodemographic, health and healthcare-related, and clinical characteristics.ResultsIn fully adjusted logistic regression models, Asian, Black, and Hispanic adults had a statistically significant higher risk of prediabetes than White adults (PR:1.26 [1.18,1.35], PR:1.17 [1.08,1.25], and PR:1.10 [1.02,1.19], respectively). Adults completing less than high school and high school had a significantly higher risk of prediabetes compared to those with a college degree (PR:1.14 [1.02,1.26] and PR:1.12 [1.01,1.23], respectively). We also found that Black and Hispanic adults had higher rates of prediabetes awareness in the fully adjusted model than White adults (PR:1.27 [1.07,1.50] and PR:1.33 [1.02,1.72], respectively). The rates of prediabetes awareness were consistently lower among those with less than a high school education relative to individuals who completed college (fully-adjusted model PR:0.66 [0.47,0.92]).DiscussionDisparities in prediabetes among racial and ethnic minority groups and adults with low educational attainment suggest challenges and opportunities for promoting health equity in high-risk groups and expanding awareness of prediabetes in the United States.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe American Diabetes Association (ADA) recommends screening for prediabetes and diabetes (dysglycemia) starting at age 35, or younger than 35 years among adults with overweight or obesity and other risk factors. Diabetes risk differs by sex, race, and ethnicity, but performance of the recommendation in these sociodemographic subgroups is unknown.MethodsNationally representative data from the National Health and Nutrition Examination Surveys (2015-March 2020) were analyzed from 5,287 nonpregnant US adults without diagnosed diabetes. Screening eligibility was based on age, measured body mass index, and the presence of diabetes risk factors. Dysglycemia was defined by fasting plasma glucose ≥100mg/dL (≥5.6 mmol/L) or haemoglobin A1c ≥5.7% (≥39mmol/mol). The sensitivity, specificity, and predictive values of the ADA screening criteria were examined by sex, race, and ethnicity.ResultsAn estimated 83.1% (95% CI=81.2-84.7) of US adults were eligible for screening according to the 2023 ADA recommendation. Overall, ADA’s screening criteria exhibited high sensitivity [95.0% (95% CI=92.7-96.6)] and low specificity [27.1% (95% CI=24.5-29.9)], which did not differ by race or ethnicity. Sensitivity was higher among women [97.8% (95% CI=96.6-98.6)] than men [92.4% (95% CI=88.3-95.1)]. Racial and ethnic differences in sensitivity and specificity among men were statistically significant (P=0.04 and P=0.02, respectively). Among women, guideline performance did not differ by race and ethnicity.DiscussionThe ADA screening criteria exhibited high sensitivity for all groups and was marginally higher in women than men. Racial and ethnic differences in guideline performance among men were small and unlikely to have a significant impact on health equity. Future research could examine adoption of this recommendation in practice and examine its effects on treatment and clinical outcomes by sex, race, and ethnicity.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aim: Non-alcoholic fatty liver disease (NAFLD) exhibits a racial disparity. We examined the prevalence and the association between race, gender, and NAFLD among prediabetes and diabetes populations among adults in the United States.Methods: We analyzed data for 3,190 individuals ≥18 years old from the National Health and Nutrition Examination Survey (NHANES) 2017–2018. NAFLD was diagnosed by FibroScan® using controlled attenuation parameter (CAP) values: S0 (none) < 238, S1 (mild) = 238–259, S2 (moderate) = 260–290, S3 (severe) > 290. Data were analyzed using Chi-square test and multinomial logistic regression, adjusting for confounding variables and considering the design and sample weights.Results: Of the 3,190 subjects, the prevalence of NAFLD was 82.6%, 56.4%, and 30.5% (p < 0.0001) among diabetes, prediabetes and normoglycemia populations respectively. Mexican American males with prediabetes or diabetes had the highest prevalence of severe NAFLD relative to other racial/ethnic groups (p < 0.05). In the adjusted model, among the total, prediabetes, and diabetes populations, a one unit increase in HbA1c was associated with higher odds of severe NAFLD [adjusted odds ratio (AOR) = 1.8, 95% confidence level (CI) = 1.4–2.3, p < 0.0001; AOR = 2.2, 95% CI = 1.1–4.4, p = 0.033; and AOR = 1.5, 95% CI = 1.1–1.9, p = 0.003 respectively].Conclusion: We found that prediabetes and diabetes populations had a high prevalence and higher odds of NAFLD relative to the normoglycemic population and HbA1c is an independent predictor of NAFLD severity in prediabetes and diabetes populations. Healthcare providers should screen prediabetes and diabetes populations for early detection of NAFLD and initiate treatments including lifestyle modification to prevent the progression to non-alcoholic steatohepatitis or liver cancer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveDiabetes mellitus is an emerging epidemic in the Arab world. Although high diabetes prevalence is documented in Israeli Arabs, information from cohort studies is scant.MethodsThis is a population study, based on information derived between 2007–2011, from the electronic database of the largest health fund in Israel, among Arabs and Jews. Prevalence, 4-year-incidence and diabetes hazard ratios [HRs], adjusted for sex and the metabolic-syndrome [MetS]-components, were determined in 3 age groups (
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.
This dataset can be used for various analytical and machine learning purposes, such as:
Facebook
TwitterIn 2023, Black adults had the highest obesity rates of any race or ethnicity in the United States, followed by American Indians/Alaska Natives and Hispanics. As of that time, around ** percent of all Black adults were obese. Asians/Pacific Islanders had by far the lowest obesity rates. Obesity in the United States Obesity is a present and growing problem in the United States. An astonishing ** percent of the adult population in the U.S. is now considered obese. Obesity rates can vary substantially by state, with around ** percent of the adult population in West Virginia reportedly obese, compared to ** percent of adults in Colorado. The states with the highest rates of obesity include West Virginia, Mississippi, and Arkansas. Diabetes Being overweight and obese can lead to a number of health problems, including heart disease, cancer, and diabetes. Being overweight or obese is one of the most common causes of type 2 diabetes, a condition in which the body does not use insulin properly, causing blood sugar levels to rise. It is estimated that just over ***** percent of adults in the U.S. have been diagnosed with diabetes. Diabetes is now the seventh leading cause of death in the United States, accounting for ***** percent of all deaths.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.
Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.
As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.
Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.
The dataset includes three files:
diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.
diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).
diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.
Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?
Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.
Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.
Facebook
TwitterBased on data from January 2017 to March 2020, it was estimated that around ** percent of non-Hispanic white adults in the United States had prediabetes. Those with prediabetes have blood sugar levels higher than normal, but not high enough to yet be diagnosed with diabetes. This statistic shows the percentage of adults in the United States with prediabetes from 2017 to 2020, by race/ethnicity.
Facebook
TwitterIn 2021, it was estimated that a total of **** million adults in the United States had prediabetes, with non-Hispanic white adults accounting for around **** million of these cases. Those with prediabetes have blood sugar levels higher than normal, but not high enough to yet be diagnosed with diabetes. This statistic shows the number of adults in the United States with prediabetes in 2021, by race and ethnicity.
Facebook
TwitterThere's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterData shown are number of cases/total and percentage (95% confidence interval).Missing values: Diabetes = 9; IGR = 7; CVD risk = 116; CKD = 37; Any = 48; All = 11.aP-values were estimated using X2 tests and show the difference in prevalence between White Europeans and South Asians for each sex.bHigh CVD risk was defined as a risk score greater than 20%.cAny risk factor means that the person has at least one of diabetes, IGR, high CVD risk or CKD. All rick factors means that the person has diabetes or IGR, high CVD risk and CKD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ImportanceRacial and ethnic disparities in chronic disease are a major public health priority.ObjectiveTo determine if the amount of federal grant funding to federally-qualified health centers (FQHCs) was associated with baseline overall prevalence of uncontrolled hypertension and uncontrolled diabetes, as well as prevalence by racial and ethnic subgroup.DesignCross-sectional multivariate regression analysis of Uniform Data System 2014–2019, which includes clinic-level data from each FQHC regarding demographics, chronic disease control by race and ethnicity, and grant funding.ExposuresOur main exposure were the average values of the prevalence of uncontrolled hypertension and uncontrolled diabetes among the overall population and by racial and ethnic group from 2014–2016.Main outcomesAverage federal grant funding per patient from 2017–2019, as measured by annual health center funding from the Bureau of Primary Health Care (BPHC) and overall federal grant funding.ResultsWe analyzed 1,205 FQHCs from 2014–2019; the average BPHC grant per patient across all FQHCs in 2019 was $168 while the average total federal grant was $184 per patient. Increasing shares of total patients with uncontrolled hypertension or uncontrolled diabetes were not associated with increased total federal grant funding in either unadjusted or adjusted analysis. Increased shares of patients who are American Indian or Alaskan Native (AI-AN) with uncontrolled hypertension and diabetes were associated with increasing total federal grant funding in both unadjusted and adjusted analysis (adjusted beta hypertension $168.3, p
Facebook
TwitterIn 2021, among diabetics in the United States, Black, non-Hispanic individuals were most likely to report having diabetic retinopathy compared to other races.
This statistic depicts the percentage of people with diabetes who had diabetic retinopathy in the United States as of 2021, by stage of disease and race and ethnicity.
Facebook
TwitterBetween 2022 and 2023, there were 32,276 young people with type 1 diabetes and 1,245 with type 2 across England and Wales. The most affected were the ones with white ethnicity. This statistic shows the share of young people under the age of 24, with type 1 and 2 diabetes in England and Wales from 2022 to 2023, by ethnicity.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 100,000 patient records designed for diabetes risk prediction, analysis, and machine learning applications. The dataset is clean, preprocessed, and ready for use in classification, regression, feature engineering, statistical analysis, and data visualization.
diabetes_dataset.csvThe dataset includes patient profiles with features based on demographics, lifestyle habits, family history, and clinical measurements that are well-established indicators of diabetes risk. All data is generated using statistical distributions inspired by real-world medical research, ensuring privacy preservation while reflecting realistic health patterns.
| Column | Type | Description | Values/Range |
|---|---|---|---|
| patient_id | Integer | Unique patient identifier | 1–100000 |
| age | Integer | Age of patient in years | 18–90 |
| gender | String | Patient gender | 'Male', 'Female', 'Other' |
| ethnicity | String | Ethnic background | 'White', 'Hispanic', 'Black', 'Asian', 'Other' |
| education_level | String | Highest completed education | 'No formal', 'Highschool', 'Graduate', 'Postgraduate' |
| income_level | String | Income category | 'Low', 'Medium', 'High' |
| employment_status | String | Employment type | 'Employed', 'Unemployed', 'Retired', 'Student' |
| smoking_status | String | Smoking behavior | 'Never', 'Former', 'Current' |
| alcohol_consumption_per_week | Float | Drinks consumed per week | 0–30 |
| physical_activity_minutes_per_week | Integer | Physical activity (weekly minutes) | 0–600 |
| diet_score | Integer | Diet quality (higher = healthier) | 0–10 |
| sleep_hours_per_day | Float | Average daily sleep hours | 3–12 |
| screen_time_hours_per_day | Float | Average daily screen time hours | 0–12 |
| family_history_diabetes | Integer | Family history of diabetes | 0 = No, 1 = Yes |
| hypertension_history | Integer | Hypertension history | 0 = No, 1 = Yes |
| cardiovascular_history | Integer | Cardiovascular history | 0 = No, 1 = Yes |
| bmi | Float | Body Mass Index (kg/m²) | 15–45 |
| waist_to_hip_ratio | Float | Waist-to-hip ratio | 0.7–1.2 |
| systolic_bp | Integer | Systolic blood pressure (mmHg) | 90–180 |
| diastolic_bp | Integer | Diastolic blood pressure (mmHg) | 60–120 |
| heart_rate | Integer | Resting heart rate (bpm) | 50–120 |
| cholesterol_total | Float | Total cholesterol (mg/dL) | 120–300 |
| hdl_cholesterol | Float | HDL cholesterol (mg/dL) | 20–100 |
| ldl_cholesterol | Float | LDL cholesterol (mg/dL) | 50–200 |
| triglycerides | Float | Triglycerides (mg/dL) | 50–500 |
| glucose_fasting | Float | Fasting glucose (mg/dL) | 70–250 |
| glucose_postprandial | Float | Post-meal glucose (mg/dL) | 90–350 |
| insulin_level | Float | Blood insulin level (µU/mL) | 2–50 |
| hba1c | Float | HbA1c (%) | 4–14 |
| diabetes_risk_score | Integer | Risk score (calculated, 0–100) | 0–100 |
| diabetes_stage | String | Stage of diabetes | 'No Diabetes', 'Pre-Diabetes', 'Type 1', 'Type 2', 'Gestational' |
| diagnosed_diabetes | Integer | Target: Diabetes diagnosis | 0 = No, 1 = Yes |
diagnosed_diabetes (Yes/No)diabetes_stageglucose_fasting, hba1c, or diabetes_risk_score
Facebook
TwitterFrom 2013 to 2021, it was estimated that among different racial/ethnic groups of adults in the United States, Samoans presented the highest prevalence of diabetes, with 20.3 percent diagnosed with diabetes. This statistic depicts the prevalence of diabetes among adults in the United States from 2013 to 2021, by detailed race and ethnicity.