Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.
Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.
As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.
Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.
The dataset includes three files:
diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.
diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).
diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.
Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?
Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.
Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.
Facebook
TwitterPopulation-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 10.790 % in 2017. United States US: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 10.790 % from Dec 2017 (Median) to 2017, with 1 observations. United States US: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.
This dataset can be used for various analytical and machine learning purposes, such as:
Facebook
TwitterSUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterPopulation-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Facebook
TwitterThese datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Facebook
TwitterPurpose:This dataset contains the Colorado census tract level Diagnosed Diabetes Prevalence Among Adults (2022) copied directly from the CDC Places Dataset. A multi-level regression and post-stratification approach was applied to BRFSS and ACS data to compute a detailed probability among adults who report being told by a doctor or other health professional that they have diabetes (other than diabetes during pregnancy for female respondents). The probability was then applied to the detailed population estimates at the appropriate geographic level to generate the prevalence. The 95% confidence interval was derived using Monte Carlo simulation. Update Schedule and URL: This dataset is updated annually (September) as the CDC PLACES dataset is updated.Fields Description:GEOID: 11-digit Census Tract FIPS Identifier COUNTY: County NameNAME: Census Tract NameDIABETES_ADJRATE: Diagnosed Diabetes Prevalence Among Adults (2022, CDC Places)DIABETES_L95CI: Diagnosed Diabetes Lower 95% Confidence IntervalDIABETES_U95CI: Diagnosed Diabetes Upper 95% Confidence Interval
Facebook
TwitterFind data on pediatric diabetes in Massachusetts. This dataset contains information on the number of cases and prevalence of Type 1 and Type 2 diabetes among students, grades K-8, in Massachusetts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*(Age-standardised incidence rates per 100,000 individuals per year with 95% confidence intervals. † For cells labeled as NA, 95% CIs could not be estimated as there was only 1 data point).
Facebook
TwitterThese data represent the predicted (modeled) prevalence of Diabetes among adults (Age 18+) for each census tract in Colorado. Diabetes is defined as ever being diagnosed with Diabetes by a doctor, nurse, or other health professional, and this definition does not include gestational, borderline, or pre-diabetes.The estimate for each census tract represents an average that was derived from multiple years of Colorado Behavioral Risk Factor Surveillance System data (2014-2017).CDPHE used a model-based approach to measure the relationship between age, race, gender, poverty, education, location and health conditions or risk behavior indicators and applied this relationship to predict the number of persons' who have the health conditions or risk behavior for each census tract in Colorado. We then applied these probabilities, based on demographic stratification, to the 2013-2017 American Community Survey population estimates and determined the percentage of adults with the health conditions or risk behavior for each census tract in Colorado.The estimates are based on statistical models and are not direct survey estimates. Using the best available data, CDPHE was able to model census tract estimates based on demographic data and background knowledge about the distribution of specific health conditions and risk behaviors.The estimates are displayed in both the map and data table using point estimate values for each census tract and displayed using a Quintile range. The high and low value for each color on the map is calculated based on dividing the total number of census tracts in Colorado (1249) into five groups based on the total range of estimates for all Colorado census tracts. Each Quintile range represents roughly 20% of the census tracts in Colorado. No estimates are provided for census tracts with a known population of less than 50. These census tracts are displayed in the map as "No Est, Pop < 50."No estimates are provided for 7 census tracts with a known population of less than 50 or for the 2 census tracts that exclusively contain a federal correctional institution as 100% of their population. These 9 census tracts are displayed in the map as "No Estimate."
Facebook
TwitterThis public health factsheet describes facts, assets, and strategies related to diabetes in Camden.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.
Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.
Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.
Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.
Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SPSS dataset.
The Cork and Kerry Diabetes and Heart Disease Study (Phase II – Mitchelstown Cohort) was a single centre study conducted between 2010 and 2011. A random sample was recruited from a large primary care centre in Mitchelstown, County Cork, Ireland. The Livinghealth Clinic serves a population of approximately 20,000 Caucasian-European subjects, with a mix of urban and rural residents. Stratified sampling was employed to recruit equal numbers of men and women from all registered attending patients in the 46–73-year age group. In total, 3,807 potential participants were selected from the practice list. Following the exclusion of duplicates, deaths and subjects incapable of consenting or attending appointment, 3,051 were invited to participate in the study and of these, 2,047 (49% male) completed the questionnaire and physical examination components of the baseline assessment (response rate: 67%). Individuals with pre-existing cardiovascular disease or T2DM were not excluded from the cohort.
Facebook
TwitterAttribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
Background: Non adherence to diabetes medications leads to frequent relapses, poor treatment outcome, reduced quality of life and significant increases in healthcare cost in a resource poor country and a healthcare system already overburdened by infectious illnesses and other diseases. This study verified the adherence of people with type2 diabetes mellitus and factors associated with it. Objective: This study was carried out to assess the prevalence of non-adherence to medication, and identify factors associated with it in patients with type 2 diabetes mellitus. Study Design: This was a cross-sectional study conducted on a sample of one hundred and twenty three out-patients, aged over 18 years and diagnosed with type 2 diabetes mellitus and who have been on oral medications for at least a year prior to study entry. Socio-demographic and clinical variables were collected and compared between participants with optimal and suboptimal adherence. Results: The mean ages of participants were 59.68±11.8 and mean duration of illness 7.22 About one-in-four (28%) were poor adherers to their diabetes medications. Variables with significant association with non-adherence include marital status (x2= 8.73, df= 1, p= 0.01), educational level (x2= 6.96, df= f, p= 0.01), employment status (x2= 4.89, df= 1, p= 0.030), duration of illness (x2= 3.07, df= 1, p= 0.08) and patients’ living arrangement (x2= 4.28, df= 1, p= 0.04). In multivariate analysis, predictors of poor adherence were: lack of treatment supervision (OR 0.032, p-value< 0.001), poor attitude to medication (OR 0.015, p< 0.001) Conclusion: Medication non-adherence in patients with type 2
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Note: This dataset has been archived as of January 2024 after confirmation from NHS Digital that the source dataset is no longer being updated, and there is not a replacement publication for the diabetic ketoacidosis admissions data. This indicator is one measure of the prevention, identification and management of people at risk of developing diabetes and those with the condition. It shows adverse outcomes as annual numbers of emergency hospital admissions for diabetic ketoacidosis and coma. Emergency admissions to hospital can be avoided by identifying people at risk, primary care services interventions, encouraging better diet and exercise, improving self-monitoring and diabetes control and supporting patients and carers in the management of diabetes in the home. It needs local health and care services working effectively together to support people’s health and independence in the community. Type 2 diabetes (around 90 percent of diabetes diagnoses) is partially preventable - it can be prevented or delayed by lifestyle changes (exercise, weight loss, healthy eating). Earlier detection of type 2 diabetes followed by effective treatment reduces the risk of developing diabetic complications. These include cardiovascular, kidney, foot and eye diseases, meaning considerable illness and reduced quality of life. There are some limitations to this data, as raw counts of hospital episodes are subject to population structures (such as numbers of people in older age groups) and other underlying variations. Counts below 5 are removed from the data. The data is updated annually. Sources: NHS Digital (now part of NHS England) - dataset P02177, and commentary from the Office for Health Improvement and Disparities (OHID) Public Health Outcomes Framework (PHOF) indicator 2.17 Recorded Diabetes.
Facebook
Twitterhttps://saildatabank.com/data/apply-to-work-with-the-data/https://saildatabank.com/data/apply-to-work-with-the-data/
A register of children diagnosed with type 1 diabetes in Wales, collected from Paediatric diabetes clinics in Wales. Maintained by the Brecon Group. Two capture-recapture studies have been done showing >97% completeness for type 1 diabetes diagnoses in Wales. Data has been collected since 1995 and is complete since then, but some people diagnosed earlier are also included.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Up to 30% of older adults (aged 65 years and older) have been diagnosed with Diabetes mellitus. Older diabetics are more likely to experience complications like heart disease, stroke, kidney problems, and nerve damage, leading to higher hospital admission rates. When managing older diabetic patients, healthcare professionals need to consider factors like frailty, polypharmacy (multiple medications), and potential cognitive impairments.
This dataset includes 83,303 people and 366,035 spells, designed to support research which improves diabetic emergency and unplanned care in older adults. It includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to acute care process, presenting complaints, admissions, microbiology results, referrals, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations and others), all blood results (urea, albumin, platelets, white blood cells and others). Includes all prescribed & administered treatments and all outcomes. Linked images are also available (radiographs, CT scans, MRI).
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionType 2 diabetes (T2D) is a growing public health burden throughout the world. Many people looking for information on how to prevent T2D will search on diabetes websites. Multiple dietary factors have a significant association with T2D risk, such as high intake of added sugars, refined carbohydrates, saturated fat, and red meat or processed meat; and decreased intake of dietary fiber, and fruits/vegetables. Despite this dietary information being available in the scientific literature, it is unclear whether this information is available in gray literature (websites).ObjectiveIn this study, we evaluate the use of specific terms from diabetes websites that are significantly associated with causes/risk factors and preventions for T2D from three term categories: (A) dietary factors, (B) nondietary nongenetic (lifestyle-associated) factors, and (C) genetic (non-modifiable) factors. We also evaluate the effect of website type (business, government, nonprofit) on term usage among websites.MethodsWe used web scraping and coding tools to quantify the use of specific terms from 73 diabetes websites. To determine the effect of term category and website type on the usage of specific terms among 73 websites, a repeated measures general linear model was performed.ResultsWe found that dietary risk factors that are significantly associated with T2D (e.g., sugar, processed carbohydrates, dietary fat, fruits/vegetables, fiber, processed meat/red meat) were mentioned in significantly fewer websites than either nondietary nongenetic factors (e.g., obesity, physical activity, dyslipidemia, blood pressure) or genetic factors (age, family history, ethnicity). Among websites that provided “eat healthy” guidance, one third provided zero dietary factors associated with type 2 diabetes, and only 30% provided more than two specific dietary factors associates with type 2 diabetes. We also observed that mean percent usage of all terms associated with T2D causes/risk factors and preventions was significantly lower among government websites compared to business websites and nonprofit websites.ConclusionDiabetes websites need to increase their usage of dietary factors when discussing causes/risk factors and preventions for T2D; as dietary factors are modifiable and strongly associated with all nondietary nongenetic risk factors, in addition to T2D risk.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.