Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of diabetes dataset was explained. The data were collected from the Iraqi society, as they data were acquired from the laboratory of Medical City Hospital and (the Specializes Center for Endocrinology and Diabetes-Al-Kindy Teaching Hospital). Patients' files were taken and data extracted from them and entered in to the database to construct the diabetes dataset. The data consist of medical information, laboratory analysis. The data attribute are: The data consist of medical information, laboratory analysis… etc. The data that have been entered initially into the system are: No. of Patient, Sugar Level Blood, Age, Gender, Creatinine ratio(Cr), Body Mass Index (BMI), Urea, Cholesterol (Chol), Fasting lipid profile, including total, LDL, VLDL, Triglycerides(TG) and HDL Cholesterol , HBA1C, Class (the patient's diabetes disease class may be Diabetic, Non-Diabetic, or Predict-Diabetic).
These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore this dataset to understand the factors associated with diabetes. Each row represents an individual, with columns indicating pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree function, age, and outcome (0 for non-diabetic, 1 for diabetic). Analyze the data to uncover insights into diabetes risk factors and prevention strategies.
Diabetes prevalence in Massachusetts has been steadily increasing.
Interactive database of gene expression and diabetes related clinical phenotypes. Allows to search gene expression in tissues as a function of obesity, strain, and age, in a mouse.
This is a data set of 2 weeks of blood glucose, insulin and carbohydrate intake data used as a standard data set to evaluate diabetes apps.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Diabetes Prediction Dataset is a dataset built for the purpose of predicting diabetes and analyzing related risk factors. It contains various characteristics such as demographics, lifestyle, and clinical measurements, so it can be used to predict a patient's risk of developing diabetes.
2) Data Utilization (1) Diabetes Prediction Dataset has characteristics that: • Key columns (characteristics) include a variety of clinical and lifestyle indicators related to diabetes, including age, gender, body mass index (BMI), blood pressure, blood sugar levels (Glucose), insulin, family history, and physical activity. (2) Diabetes Prediction Dataset can be used to: • Machine Learning/Deep Learning Model Development: It can be used to develop classification models (logistic regression, decision tree, random forest, neural network, etc.) that predict the risk of developing diabetes based on patient characteristics. • Data Analysis and Visualization: It is suitable for correlation analysis, risk factor derivation, Exploratory Data Analysis (EDA) and many other variables such as demographics, clinical figures, lifestyle, and more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Germany. There are eight features in the dataset. Among the 2000 samples
Decrease the percentage of people with Type 2 diabetes from 11.2% in 2014 to 10.1% by 2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a collection of Continuous Glucose Monitoring (CGM) data, insulin dose administration, meal ingestion counted in carbohydrate grams, steps, calories burned, heart rate, and sleep quality and quantity assessment acquired from 25 people with type 1 diabetes mellitus (T1DM). CGM data was acquired by FreeStyle Libre 2 CGMs, and Fitbit Ionic smartwatches were used to obtain steps, calories, heart rate, and sleep data for at least 14 days. This dataset could be utilized to obtain glucose prediction models, hypoglycemia and hyperglycemia prediction models, and research on the relationships among sleep, CGM values, and the rest of the mentioned variables. This dataset could be used directly from the preprocessed version or customized from raw data.
This dataset contains many factors such as BMI, blood pressure, and so on. It use machine learning techniques to forecast whether they would get diabetes or not. Typically, a diabetes prediction dataset may include the following information:
Patient Information:
Age: The age of the patient. Gender: The gender of the patient. BMI (Body Mass Index): A measure of body fat based on height and weight. Blood Pressure: Systolic and diastolic blood pressure values. Skin Thickness: Measurement of skinfold thickness. Insulin: Insulin levels in the blood. Pregnancies: Number of pregnancies (for female patients).
Health Metrics: Glucose Level: Blood glucose concentration (fasting glucose or glucose tolerance test results). Diabetes Pedigree Function: A score that represents the likelihood of diabetes based on family history. Number of Relatives with Diabetes: The count of family members who have diabetes. Target Variable:
Diabetes Outcome: A binary classification indicating whether the patient has diabetes (e.g., 1 for "diabetic" and 0 for "non-diabetic"). The goal of machine learning models using this dataset is to predict whether a patient is likely to have diabetes based on their health and demographic information.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Type 2 Diabetes Dataset (Raw and Cleaned Versions) is a healthcare dataset of 768 patients collected to predict the likelihood of developing type 2 diabetes that comes with a corrected version of raw and missing and outliers.
2) Data Utilization (1) Type 2 Diabetes Dataset (Raw and Cleaned Versions) has characteristics that: • Each sample consists of nine medical indicators: number of pregnancies (Pregnance), blood sugar (Glucose), blood pressure (BloodPressure), triceps skin thickness (SkinThickness), insulin (BMI), Diabetes PedigreeFunction, Age, Diabetes (Outcome: 1 = Diabetes, 0 = Normal), and the impossible zero value of the original data also includes a version calibrated by median replacement after treatment with NaN. (2) Type 2 Diabetes Dataset (Raw and Cleaned Versions) can be used to: • Developing Diabetes Risk Prediction Models: Using refined medical indicator data, we can train machine learning classification models such as logistic regression, random forest, and XGBoost to predict the likelihood of a patient developing diabetes. • Medical data preprocessing and feature engineering practice: by comparing the original and refined versions, you can practice data preprocessing processes such as missing value processing, outlier calibration, and feature engineering, and analyze the impact of preprocessing on predictive performance.
This data set provides de-identified population data for diabetes and hypertension comorbidity prevalence in Allegheny County. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
https://dica.nl/dpard/onderzoekhttps://dica.nl/dpard/onderzoek
DPARD (Dutch Pediatric and Adult Registry of Diabetes) registers all patients with diabetes mellitus type 1 and type 2 and other chronic types of diabetes mellitus, who are treated for this at the outpatient clinic of a hospital (both 2nd and 3rd line). DPARD includes both children and adults; the professional groups involved are pediatricians and internists.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains comprehensive health data for 1,879 patients, uniquely identified with IDs ranging from 6000 to 7878. The data includes demographic details, lifestyle factors, medical history, clinical measurements, medication usage, symptoms, quality of life scores, environmental exposures, and health behaviors. Each patient is associated with a confidential doctor in charge, ensuring privacy and confidentiality. This dataset is ideal for researchers and data scientists looking to explore factors associated with diabetes, develop predictive models, and conduct statistical analyses.
Age: The age of the patients ranges from 20 to 90 years.
Gender: Gender of the patients, where 0 represents Male and 1 represents Female.
Ethnicity: The ethnicity of the patients, coded as follows:
SocioeconomicStatus: The socioeconomic status of the patients, coded as follows:
EducationLevel: The education level of the patients, coded as follows:
BMI: Body Mass Index of the patients, ranging from 15 to 40.
Smoking: Smoking status, where 0 indicates No and 1 indicates Yes.
AlcoholConsumption: Weekly alcohol consumption in units, ranging from 0 to 20.
PhysicalActivity: Weekly physical activity in hours, ranging from 0 to 10.
DietQuality: Diet quality score, ranging from 0 to 10.
SleepQuality: Sleep quality score, ranging from 4 to 10.
FamilyHistoryDiabetes: Family history of diabetes, where 0 indicates No and 1 indicates Yes.
GestationalDiabetes: History of gestational diabetes, where 0 indicates No and 1 indicates Yes.
PolycysticOvarySyndrome: Presence of polycystic ovary syndrome, where 0 indicates No and 1 indicates Yes.
PreviousPreDiabetes: History of previous pre-diabetes, where 0 indicates No and 1 indicates Yes.
Hypertension: Presence of hypertension, where 0 indicates No and 1 indicates Yes.
SystolicBP: Systolic blood pressure, ranging from 90 to 180 mmHg.
DiastolicBP: Diastolic blood pressure, ranging from 60 to 120 mmHg.
FastingBloodSugar: Fasting blood sugar levels, ranging from 70 to 200 mg/dL.
HbA1c: Hemoglobin A1c levels, ranging from 4.0% to 10.0%.
SerumCreatinine: Serum creatinine levels, ranging from 0.5 to 5.0 mg/dL.
BUNLevels: Blood Urea Nitrogen levels, ranging from 5 to 50 mg/dL.
CholesterolTotal: Total cholesterol levels, ranging from 150 to 300 mg/dL.
CholesterolLDL: Low-density lipoprotein cholesterol levels, ranging from 50 to 200 mg/dL.
CholesterolHDL: High-density lipoprotein cholesterol levels, ranging from 20 to 100 mg/dL.
CholesterolTriglycerides: Triglycerides levels, ranging from 50 to 400 mg/dL.
AntihypertensiveMedications: Use of antihypertensive medications, where 0 indicates No and 1 indicates Yes.
Statins: Use of statins, where 0 indicates No and 1 indicates Yes.
AntidiabeticMedications: Use of antidiabetic medications, where 0 indicates No and 1 indicates Yes.
FrequentUrination: Presence of frequent urination, where 0 indicates No and 1 indicates Yes.
ExcessiveThirst: Presence of excessive thirst, where 0 indicates No and 1 indicates Yes.
UnexplainedWeightLoss: Presence of unexplained weight loss, where 0 indicates No and 1 indicates Yes.
FatigueLevels: Fatigue levels, ranging from 0 to 10.
BlurredVision: Presence of blurred vision, where 0 indicates No and 1 indicates Yes.
SlowHealingSores: Presence of slow-healing sores, where 0 indicates No and 1 indicates Yes.
TinglingHandsFeet: Presence of tingling in hands or feet, where 0 indicates No and 1 indicates Yes.
QualityOfLifeScore: Quality of life score, ranging from 0 to 100.
HeavyMetalsExposure: Exposure to heavy metals, where 0 indicates No and 1 indicates Yes.
OccupationalExposureChemicals: Occupational exposure to harmful chemicals, where 0 indicates No and 1 indicates Yes.
WaterQuality: Quality of water, where 0 indicates Good and 1 indicates Poor.
This is a source dataset for a Let's Get Healthy California indicator at "https://letsgethealthy.ca.gov/. This table displays the prevalence of diabetes in California. It contains data for California only. The data are from the California Behavioral Risk Factor Surveillance Survey (BRFSS). The California BRFSS is an annual cross-sectional health-related telephone survey that collects data about California residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. The BRFSS is conducted by Public Health Survey Research Program of California State University, Sacramento under contract from CDPH. This prevalence rate does not include pre-diabetes, or gestational diabetes. This is based on the question: "Has a doctor, or nurse or other health professional ever told you that you have diabetes?" The sample size for 2014 was 8,832. NOTE: Denominator data and weighting was taken from the California Department of Finance, not U.S. Census. Values may therefore differ from what has been published in the national BRFSS data tables by the Centers for Disease Control and Prevention (CDC) or other federal agencies.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Original data from: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes Changes made: - Rows with missing values ('0' values) for BP column, triceps, insulin and BMI were removed. Number of rows reduced from 768 (original) to 394. Atrributes 0. Class variable (-1=normal or +1=diabetes) 1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years)
The Type 1 Diabetes Genetics Consortium (T1DGC) was an international, multicenter program organized to promote research to identify genes and alleles that determine an individual's risk for type 1 diabetes. The program had two primary goals: (1) to identify genomic regions and candidate genes whose variants modify an individual’s risk of type 1 diabetes and help explain the clustering of the disease in families and (2) to make research data available to and establish resources that can be used by the research community. The T1DGC assembled a resource of affected sib-pair families, parent-child trios, and case-control collections with banks of DNA, serum, plasma, and EBV-transformed cell lines. In addition to T1DGC-recruited ASP families, the T1DGC recruited trio families from ethnic groups with lower prevalence of type 1 diabetes. The T1DGC also welcomed the inclusion of earlier ascertained case-control collections (from the UK, Denmark, etc.). Research with T1DGC data has included genome-wide linkage scans, evaluation of the human major histocompatibility complexes, examination of published candidate genes for type 1 diabetes, and examination of autoimmune disease genes and those affecting β-cell function in type 2 diabetes.
In 2007, the T1DGC incorporated over 7,000 cases from the UK (the JDRF/WT case series, aka GRID). GRID samples are available here, and data from dbGaP, the European Genome-phenome Archive (EGA) and data and documentation at the JDRF/WT DIL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
treatments
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of diabetes dataset was explained. The data were collected from the Iraqi society, as they data were acquired from the laboratory of Medical City Hospital and (the Specializes Center for Endocrinology and Diabetes-Al-Kindy Teaching Hospital). Patients' files were taken and data extracted from them and entered in to the database to construct the diabetes dataset. The data consist of medical information, laboratory analysis. The data attribute are: The data consist of medical information, laboratory analysis… etc. The data that have been entered initially into the system are: No. of Patient, Sugar Level Blood, Age, Gender, Creatinine ratio(Cr), Body Mass Index (BMI), Urea, Cholesterol (Chol), Fasting lipid profile, including total, LDL, VLDL, Triglycerides(TG) and HDL Cholesterol , HBA1C, Class (the patient's diabetes disease class may be Diabetic, Non-Diabetic, or Predict-Diabetic).