Facebook
TwitterDataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here
The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?
Facebook
TwitterThese datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Facebook
TwitterDiabetes is the fourth leading cause of death in the world and one of the most common endocrine disorders. According to studies, Type 2 diabetes kills thousands of people around the world every year and imposes huge costs on societies in the form of surgeries and other treatment programs, as well as controlling complications and disability. Therefore, predicting and early diagnosis of this disease can greatly help governments and patients.
This dataset is the output of a Chinese research study conducted in 2016. It includes 1304 samples of patients who tested positive for diabetes, and the age of the participants ranges from 21 to 99 years old. The dataset was collected according to the indicators and standards of the World Health Organization, making it a reliable source for building diabetes diagnosis models. Researchers and healthcare professionals can use this dataset to train and test machine learning models to predict and diagnose diabetes in patients.
Features of Dataset: Age Gender BMI SBP (Systolic Blood Pressure) DBP (Diastolic Blood Pressure) FPG (Fasting Plasma Glucose) FFPG (Final Fasting Plasma Glucose) Cholesterol Triglyceride HDL (High-Density Lipoprotein) LDL (Low-Density Lipoprotein) ALT (Alanine Aminotransferase) BUN (Blood urea nitrogen) CCR (Creatinine Clearance) Smoking Status: (1: Current Smoker, 2: Ever Smoker, 3: Never Smoker) Drinking Status: (1: Current Drinker, 2: Ever Drinker, 3: Never Drinker) Family History of Diabetes: (1: Yes, 0: No) Diabetes
More details about dataset: The main dataset, without cleaning, is available at the following link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.ft8750v. The main article corresponding to the dataset can be found at: https://doi.org/10.11.../bmjopen-2018-021768
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.
Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.
Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.
Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.
Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.
Dataset Use Cases This dataset can be used for various analytical and machine learning purposes, such as:
Predictive Modeling: Build models to predict the likelihood of diabetes based on demographic and health-related features. Health Analytics: Analyze the correlation between different health metrics (e.g., BMI, HbA1c level) and diabetes. Demographic Studies: Examine the distribution of diabetes across different demographic groups and locations. Public Health Research: Identify risk factors for diabetes and target interventions to high-risk groups. Clinical Research: Study the relationship between comorbid conditions like hypertension and heart disease with diabetes. Potential Analyses Descriptive Statistics: Summarize the dataset to understand the central tendencies and dispersion of features. Correlation Analysis: Identify the relationships between features. Classification Models: Use machine learning algorithms to classify individuals as diabetic or non-diabetic. Trend Analysis: Analyze trends over the years to see how diabetes prevalence has changed. clinical_notes: clinical summaries based on patient attributes
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveAfrica presents a higher diabetic foot ulcer prevalence estimate of 7.2% against global figures of 6.3%. Engaging family members in self-care education interventions has been shown to be effective at preventing diabetes-related foot ulcers. This study culturally adapted and tested the feasibility and acceptability of an evidence-based footcare family intervention in Ghana.MethodsThe initial phase of the study involved stakeholder engagement, comprising Patient Public Involvement activities and interviews with key informant nurses and people with diabetes (N = 15). In the second phase, adults at risk of diabetes-related foot ulcers and nominated caregivers (N = 50 dyads) participated in an individually randomised feasibility trial of the adapted intervention (N = 25) compared to usual care (N = 25). The study aimed to assess feasibility outcomes and to identify efficacy signals on clinical outcomes at 12 weeks post randomisation. Patient reported outcomes were foot care behaviour, foot self-care efficacy, diabetes knowledge and caregiver diabetes distress.ResultsAdjustments were made to the evidence-based intervention to reflect the literacy, information needs and preferences of stakeholders and to develop a context appropriate diabetic foot self-care intervention. A feasibility trial was then conducted which met all recruitment, retention, data quality and randomisation progression criteria. At 12 weeks post randomisation, efficacy signals favoured the intervention group on improved footcare behaviour, foot self-care efficacy, diabetes knowledge and reduced diabetes distress. Future implementation issues to consider include the staff resources needed to deliver the intervention, family members availability to attend in-person sessions and consideration of remote intervention delivery.ConclusionA contextual family-oriented foot self-care education intervention is feasible, acceptable, and may improve knowledge and self-care with the potential to decrease diabetes-related complications. The education intervention is a strategic approach to improving diabetes care and prevention of foot disease, especially in settings with limited diabetes care resources. Future research will investigate the possibility of remote delivery to better meet patient and staff needs.Trial registrationPan African Clinical Trials Registry (PACTR) ‐ PACTR202201708421484: https://pactr.samrc.ac.za/TrialDisplay.aspx?TrialID=19363 or pactr.samrc.ac.za/Search.aspx.
Facebook
TwitterPopulation-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Facebook
TwitterThis is a source dataset for a Let's Get Healthy California indicator at "https://letsgethealthy.ca.gov/. This table displays the prevalence of diabetes in California. It contains data for California only. The data are from the California Behavioral Risk Factor Surveillance Survey (BRFSS). The California BRFSS is an annual cross-sectional health-related telephone survey that collects data about California residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. The BRFSS is conducted by Public Health Survey Research Program of California State University, Sacramento under contract from CDPH. This prevalence rate does not include pre-diabetes, or gestational diabetes. This is based on the question: "Has a doctor, or nurse or other health professional ever told you that you have diabetes?" The sample size for 2014 was 8,832. NOTE: Denominator data and weighting was taken from the California Department of Finance, not U.S. Census. Values may therefore differ from what has been published in the national BRFSS data tables by the Centers for Disease Control and Prevention (CDC) or other federal agencies.
Facebook
TwitterSUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterThis data set provides de-identified population data for diabetes & hypertension & hyperlipidemia comorbidity prevelance. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Facebook
TwitterArchived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the services of patients with diabetes. It contains information about the total number of patients, total number of claims, and total dollar amount, grouped by provider. Restricted to claims with service date between 01/2012 to 12/2017. Diabetic patients are identified as diagnosed with the following ICD codes: E110, E112, E114, E115, E116, E118, E119, 25000, 25002, 25010, 25012, 25020, 2522, 25030, 25040, 25040, 25042, 25050, 25052, 25060, 25062, 25070, 25072, 25080, 25082, 25090, 25092, and O241 between 2010 to 2017. Provider is billing provider. Provider with an NPI are only considered. All types of claims except dental are considered. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.
Facebook
TwitterFind data on pediatric diabetes in Massachusetts. This dataset contains information on the number of cases and prevalence of Type 1 and Type 2 diabetes among students, grades K-8, in Massachusetts.
Facebook
TwitterT1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Sex – Sex of the patient. Values: F (for female), masculine (for male)
Birth_year – Year of birth of the patient. Format: YYYY.
Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIMA Indians diabetes dataset classification result.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
DESCRIPTION Real patient data to manipulate and to predict diabetes (yes or no) using demographic and lab variables. SUMMARY
Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics.
The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Excel sheet with data of the original research 'Evaluation of simple and cost-effective hematological inflammatory biomarkers in type 2 diabetes and their correlation with glycemic control'
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
South Africa ZA: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 5.520 % in 2017. South Africa ZA: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 5.520 % from Dec 2017 (Median) to 2017, with 1 observations. South Africa ZA: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s South Africa – Table ZA.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;
Facebook
Twitterhttps://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Cohort description:Human carotid plaques used in this study were obtained from the Carotid Plaque Imaging Project biobank (Malmö, Sweden; ClinicalTrials.gov ID NCT05821894). These plaques were collected from patients undergoing carotid endarterectomy (CEA) at Skåne University Hospital's Vascular department in Malmö, Sweden. The indications for surgery were: plaques associated with ipsilateral symptoms (transitory ischemic attack, stroke, or amaurosis fugax) and a degree of stenosis greater than 70% (measured by duplex ultrasound), or plaques from asymptomatic patients, with a degree of stenosis greater than 80%. The study has received ethical approval, and all patients provided oral and written informed consent. All work involving human subjects adheres to the principles of the 1975 Declaration of Helsinki.Dataset description:Bulk RNA sequencing was performed on carotid plaques from patients with type 2 diabetes (T2D). Total RNA was extracted with the Trizol method, and libraries were prepared using the ScriptSeq™ v2 Kit. Sequencing was performed on Illumina HiSeq2000 and NextSeq platforms. Reads were aligned to the GRCh38 genome with STAR and quantified using Salmon with GENCODE V27 annotations. Counts were normalized with edgeR and expressed as log2-transformed counts per million (CPM) after batch corrected for sequencing platforms.Single-cell transcriptomes were generated from human atherosclerotic plaque cells. Live cells were stained for CD45 and FACS sorted into 384-well plates, separating CD45+ and CD45- populations. RNA libraries were prepared and sequenced using the SmartSeq2 protocol at the SciLife Eukaryotic Genomics Facility. Cells with fewer than 10,000 raw reads, fewer than 500 detected genes, or more than 15% ERCC RNA Spike-In were excluded. Counts were log-normalized, scaled, and the top 2000 highly variable genes were used for dimensional reduction.Spatial transcriptomics was performed on 10µm carotid plaque sections using the Visium Spatial Gene Expression Slide & Reagent Kit (PN-1000184) and the standard protocols (CG000239 RevD,10x Genomics, Pleasanton, CA, USA). Libraries were sequenced on the NextSeq 500/550 platform with the High Output Kit v2.5 (150 cycles) at a depth of 400 million read pairs per sample. FASTQ files and corresponding histological images were processed with Space Ranger v1.0.0, using STAR v2.5.1b for genome alignment against the hg38 reference genome.Data availability:Current European data regulations preclude the open sharing of sensitive data from living humans, including genetic and sequencing data. The sequencing data used in this study can be accessed by making a reasonable request to the corresponding author, provided the legal terms for access are met.Terms for access:- The human data complies with GDPR regulations and is available, upon request, to qualified academic investigators solely for the purpose of replicating the procedures and findings outlined in the article.- Access to the dataset is contingent upon successful completion of a data sharing agreement with the principal investigator (thus ensuring compliance with GDPR) as well as written approvals from the ethical review board of Sweden, Region Skåne, and Lund University, respectively.- The existing ethical permit restricts the sharing of raw individual data due to its sensitive character, allowing only the sharing of aggregated data. Should researchers seek access to raw individual data, a separate written ethical approval and other legal requirements must be provided, in order for the request to be considered.- Approved researchers must refrain from attempting to identify or contact individual study participants represented in the dataset. They may not generate information that could compromise participants' identities.- Users are prohibited from using the datasets or any derivatives thereof for commercial purposes.- Approved investigators are obligated to immediately report any unauthorised data sharing or breaches of data security on their behalf to the data access committee. Such reports should include comprehensive details to facilitate resolution and ensure data confidentiality.
Facebook
TwitterDiabetes is a chronic disease that affects the way the body processes blood sugar, also known as glucose. Glucose is an important source of energy for the body's cells, and insulin, a hormone produced by the pancreas, helps to regulate glucose levels in the blood.
In people with diabetes, the body either doesn't produce enough insulin, or it can't effectively use the insulin it produces. This causes glucose to build up in the blood, leading to a range of health problems over time.
There are two main types of diabetes: type 1 and type 2. Type 1 diabetes, also known as juvenile diabetes, is usually diagnosed in children and young adults. It occurs when the body's immune system attacks and destroys the cells in the pancreas that produce insulin. People with type 1 diabetes need to take insulin injections or use an insulin pump to manage their blood sugar levels.
Type 2 diabetes is the most common form of diabetes, accounting for around 90% of all cases. It usually develops in adults, but can also occur in children and teenagers. In type 2 diabetes, the body becomes resistant to the effects of insulin, and the pancreas may not produce enough insulin to keep blood sugar levels in check. Lifestyle changes, such as a healthy diet and regular exercise, can help manage type 2 diabetes, and some people may also need medication or insulin therapy.
Both types of diabetes can lead to serious health complications over time, including heart disease, stroke, kidney disease, nerve damage, and eye problems. It's important for people with diabetes to work closely with their healthcare team to manage their condition and prevent these complications.
دیابت بیماری مزمنی است که نحوه پردازش قند خون را در بدن تحت تأثیر قرار میدهد. قند یک منبع مهم انرژی برای سلولهای بدن است و انسولین، یک هورمون توسط پانکراس تولید شده، به کنترل سطح قند خون در بدن کمک میکند
در افراد دیابتی، بدن یا انسولین کافی تولید نمیکند یا نمیتواند به طور موثر از انسولینی که تولید میشود، استفاده کند. این باعث میشود که قند در خون تجمع پیدا کند که به مشکلات سلامتی در طول زمان منجر میشود
دو نوع اصلی دیابت وجود دارد: نوع ۱ و نوع ۲. دیابت نوع ۱ یا دیابت جوانان، معمولاً در کودکان و جوانان بزرگسال تشخیص داده میشود. این بیماری زمانی رخ میدهد که سیستم ایمنی بدن سلولهای پانکراسی را که انسولین تولید میکنند، حمله میکند و از بین میبرد. افراد دیابتی نوع ۱ باید تزریقات انسولین یا استفاده از پمپ انسولین برای کنترل سطح قند خون خود استفاده کنند
دیابت نوع ۲ شایعترین نوع دیابت است که حدود ۹۰٪ از کل موارد را شامل میشود. این نوع بیماری معمولاً در بزرگسالان ایجاد میشود، اما ممکن است در کودکان و نوجوانان نیز رخ دهد. در دیابت نوع ۲، بدن به اثرات انسولین مقاومت پیدا میکند و پانکراس ممکن است انسولین کافی برای کنترل سطح قند خون تولید نکند. تغییرات سبک زندگی مانند رژیم غذایی سالم و ورزش منظم میتواند به مدیریت دیابت نوع ۲ کمک کند و برخی افراد ممکن است نیاز به دارو یا درمان انسولین داشته باشند
هر دو نوع دیابت میتواند منجر به مشکلات سلامتی جدی در طول زمان شود، از جمله بیماری قلبی، سکته مغزی، بیماری کلیه، آسیب عصبی و مشکلات چشمی. برای افراد دارای دیابت، مهم است که به همراه تیم مراقبت از سلامتی خود همکاری کرده و برای جلوگیری از این مشکلات تلاش کنند
Diabetes ist eine chronische Krankheit, die die Art und Weise beeinflusst, wie der Körper Blutzucker, auch als Glukose, verarbeitet. Glukose ist eine wichtige Energiequelle für die Zellen des Körpers, und Insulin, ein Hormon, das von der Bauchspeicheldrüse produziert wird, hilft bei der Regulierung des Glukosespiegels im Blut.
Bei Menschen mit Diabetes produziert der Körper entweder nicht genug Insulin oder kann das Insulin, das er produziert, nicht effektiv nutzen. Dies führt dazu, dass sich Glukose im Blut ansammelt, was im Laufe der Zeit zu einer Reihe von Gesundheitsproblemen führen kann.
Es gibt zwei Haupttypen von Diabetes: Typ 1 und Typ 2. Diabetes Typ 1, auch als juveniler Diabetes bekannt, wird in der Regel bei Kindern und jungen Erwachsenen diagnostiziert. Es tritt auf, wenn das Immunsystem des Körpers die Zellen in der Bauchspeicheldrüse angreift und zerstört, die Insulin produzieren. Menschen mit Diabetes Typ 1 müssen Insulininjektionen oder eine Insulinpumpe verwenden, um ihren Blutzuckerspiegel zu kontrollieren.
Diabetes Typ 2 ist die häufigste Form von Diabetes und macht etwa 90% aller Fälle aus. Es entwickelt sich in der Regel bei Erwachsenen, kann aber auch bei Kindern und Jugendlichen auftreten. Bei Diabetes Typ 2 wird der Körper gegenüber den Wirkungen von Insulin resistent, und die Ba...
Facebook
Twitterand diabetic macular edema for each image.
Facebook
TwitterDataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here
The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?