CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years.
Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Find data on pediatric diabetes in Massachusetts. This dataset contains information on the number of cases and prevalence of Type 1 and Type 2 diabetes among students, grades K-8, in Massachusetts.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set provides de-identified population data for diabetes and hyperlipidemia comorbidity prevalence. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years.
Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset presents information on age-sex specific incidence rates of diabetes by First Nations status for Alberta, expressed as per 100,000 population.
Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Diabetes Health Indicators Dataset is a large health dataset that collects various health indicators and lifestyle information related to diabetes diagnosis based on health surveys and medical records of the U.S. population.
2) Data Utilization (1) Diabetes Health Indicators Dataset has characteristics that: • The dataset consists of more than 250,000 samples and contains more than 20 health and demographic variables, including diabetes (binary or triage label), age, gender, BMI, blood pressure, cholesterol, smoking and drinking habits, physical activity, mental health, income, and education level. (2) Diabetes Health Indicators Dataset can be used to: • Diabetes prediction model development: It can be used to develop machine learning-based classification models that use health indicators and lifestyle data to predict the risk of developing diabetes. • A Study on the Correlation between Lifestyle and Diabetes: It can be used in epidemiological and public health studies to analyze the effects of various lifestyle and demographic variables such as smoking, drinking, exercise, and eating habits on diabetes incidence.
We are trying to build a machine learning model to accurately predict whether the patients have diabetes or not.our objective is to prevent, cure and to improve the lives of all people affected by diabetes.
The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor vari- ables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg/(height in m)2) DiabetesPedigreeFunction: Diabetes pedigree function Age:Age(years) Outcome: Classvariable(0 or 1)
The motivation was to experiment with end to end machine learning project and get some idea about deployment platform like and offcourse this " Diabetes is an increasingly growing health issue due to our inactive lifestyle. If it is detected in time then through proper medical treatment, adverse effects can be prevented. To help in early detection, technology can be used very reliably and efficiently. Using machine learning we have built a predictive model that can predict whether the patient is diabetes positive or not.". This is also sort of fun to work on a project like this which could be beneficial for the society.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a collection of Continuous Glucose Monitoring (CGM) data, insulin dose administration, meal ingestion counted in carbohydrate grams, steps, calories burned, heart rate, and sleep quality and quantity assessment acquired from 25 people with type 1 diabetes mellitus (T1DM). CGM data was acquired by FreeStyle Libre 2 CGMs, and Fitbit Ionic smartwatches were used to obtain steps, calories, heart rate, and sleep data for at least 14 days. This dataset could be utilized to obtain glucose prediction models, hypoglycemia and hyperglycemia prediction models, and research on the relationships among sleep, CGM values, and the rest of the mentioned variables. This dataset could be used directly from the preprocessed version or customized from raw data.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 10.790 % in 2017. United States US: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 10.790 % from Dec 2017 (Median) to 2017, with 1 observations. United States US: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;
This public health factsheet describes facts, assets, and strategies related to diabetes in Camden.
This dataset is from UCI machine learning repository: 130 US hospital diabetes dataset However, I did several cleaning and this is the output. How I did the cleaning, you can read more here https://github.com/rischanlab/Cleaning_diabetes_130_US_hospital_dataset
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset present information on age-sex specific incidence rates of diabetes for Alberta and AHS continuum zone, expressed as per 100,000 population.
This dataset presents information on age-standardized incidence rates of diabetes for Alberta, for selected geographic areas , expressed as per 100,000 population.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Years of Life Lost (YLL) as a result of death from diabetes. Directly age-Standardised Rates (DSR) per 100,000 population Source: Office for National Statistics (ONS) Publisher: Information Centre (IC) - Clinical and Health Outcomes Knowledge Base Geographies: Local Authority District (LAD), Government Office Region (GOR), National, Strategic Health Authority (SHA) Geographic coverage: England Time coverage: 2005-07, 2007 Type of data: Administrative data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diabetes is a chronic disease, which is characterized by abnormally high blood sugar levels. It may affect various organs and tissues, and even lead to life-threatening complications. Accurate prediction of diabetes can significantly reduce its incidence. However, the current prediction methods struggle to accurately capture the essential characteristics of nonlinear data, and the black-box nature of these methods hampers its clinical application. To address these challenges, we propose KCCAM_DNN, a diabetes prediction method that integrates Kendall’s correlation coefficient and an attention mechanism within a deep neural network. In the KCCAM_DNN, Kendall’s correlation coefficient is initially employed for feature selection, which effectively filters out key features influencing diabetes prediction. For missing values in the data, polynomial regression is utilized for imputation, ensuring data completeness. Subsequently, we construct a deep neural network (KCCAM_DNN) based on the self-attention mechanism, which assigns greater weight to crucial features affecting diabetes and enhances the model’s predictive performance. Finally, we employ the SHAP model to analyze the impact of each feature on diabetes prediction, augmenting the model’s interpretability. Experimental results show that KCCAM_DNN exhibits superior performance on both PIMA Indian and LMCH diabetes datasets, achieving test accuracies of 99.090% and 99.333%, respectively, approximately 2% higher than the best existing method. These results suggest that KCCAM_DNN is proficient in diabetes prediction, providing a foundation for informed decision-making in the diagnosis and prevention of diabetes.
This dataset contains many factors such as BMI, blood pressure, and so on. It use machine learning techniques to forecast whether they would get diabetes or not. Typically, a diabetes prediction dataset may include the following information:
Patient Information:
Age: The age of the patient. Gender: The gender of the patient. BMI (Body Mass Index): A measure of body fat based on height and weight. Blood Pressure: Systolic and diastolic blood pressure values. Skin Thickness: Measurement of skinfold thickness. Insulin: Insulin levels in the blood. Pregnancies: Number of pregnancies (for female patients).
Health Metrics: Glucose Level: Blood glucose concentration (fasting glucose or glucose tolerance test results). Diabetes Pedigree Function: A score that represents the likelihood of diabetes based on family history. Number of Relatives with Diabetes: The count of family members who have diabetes. Target Variable:
Diabetes Outcome: A binary classification indicating whether the patient has diabetes (e.g., 1 for "diabetic" and 0 for "non-diabetic"). The goal of machine learning models using this dataset is to predict whether a patient is likely to have diabetes based on their health and demographic information.
South Africa is experiencing a rapidly growing diabetes epidemic that threatens its healthcare system. Research on the determinants of diabetes in South Africa receives considerable attention due to the lifestyle changes accompanying South Africa’s rapid urbanization since the fall of Apartheid. However, few studies have investigated how segments of the Black South African population, who continue to endure Apartheid’s institutional discriminatory legacy, experience this transition. This paper explores the association between individual and area-level socioeconomic status and diabetes prevalence, awareness, treatment, and control within a sample of Black South Africans aged 45 years or older in three municipalities in KwaZulu-Natal. Cross-sectional data were collected on 3,685 participants from February 2017 to February 2018. Individual-level socioeconomic status was assessed with employment status and educational attainment. Area-level deprivation was measured using the most recent South African Multidimensional Poverty Index scores. Covariates included age, sex, BMI, and hypertension diagnosis. The prevalence of diabetes was 23% (n = 830). Of those, 769 were aware of their diagnosis, 629 were receiving treatment, and 404 had their diabetes controlled. Compared to those with no formal education, Black South Africans with some high school education had increased diabetes prevalence, and those who had completed high school had lower prevalence of treatment receipt. Employment status was negatively associated with diabetes prevalence. Black South Africans living in more deprived wards had lower diabetes prevalence, and those residing in wards that became more deprived from 2001 to 2011 had a higher prevalence diabetes, as well as diabetic control. Results from this study can assist policymakers and practitioners in identifying modifiable risk factors for diabetes among Black South Africans to intervene on. Potential community-based interventions include those focused on patient empowerment and linkages to care. Such interventions should act in concert with policy changes, such as expanding the existing sugar-sweetened beverage tax.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundData on the future diabetes burden in Scandinavia is limited. Our aim was to project the future burden of diabetes in Sweden by modelling data on incidence, prevalence, mortality, and demographic factors.MethodTo project the future burden of diabetes we used information on the prevalence of diabetes from the national drug prescription registry (adults ≥20 years), previously published data on relative mortality in people with diabetes, and population demographics and projections from Statistics Sweden. Alternative scenarios were created based on different assumptions regarding the future incidence of diabetes.ResultsBetween 2007 and 2013 the prevalence of diabetes rose from 5.8 to 6.8% in Sweden but incidence remained constant at 4.4 per 1000 (2013). With constant incidence and continued improvement in relative survival, prevalence will increase to 10.4% by year 2050 and the number of afflicted individuals will increase to 940 000. Of this rise, 30% is accounted for by changes in the age structure of the population and 14% by improved relative survival in people with diabetes. A hypothesized 1% annual rise in incidence will result in a prevalence of 12.6% and 1 136 000 cases. Even with decreasing incidence at 1% per year, prevalence of diabetes will continue to increase.ConclusionWe can expect diabetes prevalence to rise substantially in Sweden over the next 35 years as a result of demographic changes and improved survival among people with diabetes. A dramatic reduction in incidence is required to prevent this development.
Number and percentage of persons having been diagnosed with diabetes, by age group and sex.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years.
Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.