Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.
This dataset can be used for various analytical and machine learning purposes, such as:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: Welcome to the Diabetes Prediction Dataset, a valuable resource for researchers, data scientists, and medical professionals interested in the field of diabetes risk assessment and prediction. This dataset contains a diverse range of health-related attributes, meticulously collected to aid in the development of predictive models for identifying individuals at risk of diabetes. By sharing this dataset, we aim to foster collaboration and innovation within the data science community, leading to improved early diagnosis and personalized treatment strategies for diabetes.
Columns: 1. Id: Unique identifier for each data entry. 2. Pregnancies: Number of times pregnant. 3. Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test. 4. BloodPressure: Diastolic blood pressure (mm Hg). 5. SkinThickness: Triceps skinfold thickness (mm). 6. Insulin: 2-Hour serum insulin (mu U/ml). 7. BMI: Body mass index (weight in kg / height in m^2). 8. DiabetesPedigreeFunction: Diabetes pedigree function, a genetic score of diabetes. 9. Age: Age in years. 10. Outcome: Binary classification indicating the presence (1) or absence (0) of diabetes.
Utilize this dataset to explore the relationships between various health indicators and the likelihood of diabetes. You can apply machine learning techniques to develop predictive models, feature selection strategies, and data visualization to uncover insights that may contribute to more accurate risk assessments. As you embark on your journey with this dataset, remember that your discoveries could have a profound impact on diabetes prevention and management.
Please ensure that you adhere to ethical guidelines and respect the privacy of individuals represented in this dataset. Proper citation and recognition of this dataset's source are appreciated to promote collaboration and knowledge sharing.
Start your exploration of the Diabetes Prediction Dataset today and contribute to the ongoing efforts to combat diabetes through data-driven insights and innovations.
Facebook
TwitterSouth Africa is experiencing a rapidly growing diabetes epidemic that threatens its healthcare system. Research on the determinants of diabetes in South Africa receives considerable attention due to the lifestyle changes accompanying South Africa’s rapid urbanization since the fall of Apartheid. However, few studies have investigated how segments of the Black South African population, who continue to endure Apartheid’s institutional discriminatory legacy, experience this transition. This paper explores the association between individual and area-level socioeconomic status and diabetes prevalence, awareness, treatment, and control within a sample of Black South Africans aged 45 years or older in three municipalities in KwaZulu-Natal. Cross-sectional data were collected on 3,685 participants from February 2017 to February 2018. Individual-level socioeconomic status was assessed with employment status and educational attainment. Area-level deprivation was measured using the most recent South African Multidimensional Poverty Index scores. Covariates included age, sex, BMI, and hypertension diagnosis. The prevalence of diabetes was 23% (n = 830). Of those, 769 were aware of their diagnosis, 629 were receiving treatment, and 404 had their diabetes controlled. Compared to those with no formal education, Black South Africans with some high school education had increased diabetes prevalence, and those who had completed high school had lower prevalence of treatment receipt. Employment status was negatively associated with diabetes prevalence. Black South Africans living in more deprived wards had lower diabetes prevalence, and those residing in wards that became more deprived from 2001 to 2011 had a higher prevalence diabetes, as well as diabetic control. Results from this study can assist policymakers and practitioners in identifying modifiable risk factors for diabetes among Black South Africans to intervene on. Potential community-based interventions include those focused on patient empowerment and linkages to care. Such interventions should act in concert with policy changes, such as expanding the existing sugar-sweetened beverage tax.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Diabetes Prediction dataset is used to predict whether a patient has diabetes based on health metrics. • Key features include pregnancies, glucose levels, blood pressure, BMI, insulin levels, family history of diabetes, and age.
2) Data Utilization (1) Characteristics of the Diabetes Prediction Dataset • This dataset contains crucial health indicators for predicting diabetes, such as glucose levels, BMI, and age, along with binary variables like number of pregnancies and diabetes outcome. • Continuous data, like glucose and BMI, provide detailed health insights, while binary variables help identify whether a person has diabetes. • This dataset allows for a comprehensive understanding of the factors contributing to the risk of diabetes.
(2) Applications of the Diabetes Prediction Dataset • Early Diagnosis and Prevention: This dataset is useful for predicting the risk of diabetes, allowing healthcare providers to make quicker and more accurate diagnoses, as well as prevent the disease early. • Health Management and Personalized Plans: It can be used to create personalized health management plans or preventive measures tailored to each individual's health status. • Research and Policy Development: Researchers can use this data to study diabetes prevention and contribute to the development of improved public health policies.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.
Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.
As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.
Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.
The dataset includes three files:
diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.
diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).
diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.
Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?
Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.
Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.
Facebook
TwitterThis dataset contains information on the proportion by age, total number, male and female and sex of adults of adults diagnosed with diabetes, collected from the system of health-related telephone surveys, the Behavioral Risk Factor Surveillance System (BRFSS), conducted in more than 400,000 patients, from 50 states in the US, the District of Columbia and three US territories.
Facebook
TwitterT1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Sex – Sex of the patient. Values: F (for female), masculine (for male)
Birth_year – Year of birth of the patient. Format: YYYY.
Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Diabetes Health Indicators Dataset is a large health dataset that collects various health indicators and lifestyle information related to diabetes diagnosis based on health surveys and medical records of the U.S. population.
2) Data Utilization (1) Diabetes Health Indicators Dataset has characteristics that: • The dataset consists of more than 250,000 samples and contains more than 20 health and demographic variables, including diabetes (binary or triage label), age, gender, BMI, blood pressure, cholesterol, smoking and drinking habits, physical activity, mental health, income, and education level. (2) Diabetes Health Indicators Dataset can be used to: • Diabetes prediction model development: It can be used to develop machine learning-based classification models that use health indicators and lifestyle data to predict the risk of developing diabetes. • A Study on the Correlation between Lifestyle and Diabetes: It can be used in epidemiological and public health studies to analyze the effects of various lifestyle and demographic variables such as smoking, drinking, exercise, and eating habits on diabetes incidence.
Facebook
TwitterDiabetes is the fourth leading cause of death in the world and one of the most common endocrine disorders. According to studies, Type 2 diabetes kills thousands of people around the world every year and imposes huge costs on societies in the form of surgeries and other treatment programs, as well as controlling complications and disability. Therefore, predicting and early diagnosis of this disease can greatly help governments and patients.
This dataset is the output of a Chinese research study conducted in 2016. It includes 1304 samples of patients who tested positive for diabetes, and the age of the participants ranges from 21 to 99 years old. The dataset was collected according to the indicators and standards of the World Health Organization, making it a reliable source for building diabetes diagnosis models. Researchers and healthcare professionals can use this dataset to train and test machine learning models to predict and diagnose diabetes in patients.
Features of Dataset: Age Gender BMI SBP (Systolic Blood Pressure) DBP (Diastolic Blood Pressure) FPG (Fasting Plasma Glucose) FFPG (Final Fasting Plasma Glucose) Cholesterol Triglyceride HDL (High-Density Lipoprotein) LDL (Low-Density Lipoprotein) ALT (Alanine Aminotransferase) BUN (Blood urea nitrogen) CCR (Creatinine Clearance) Smoking Status: (1: Current Smoker, 2: Ever Smoker, 3: Never Smoker) Drinking Status: (1: Current Drinker, 2: Ever Drinker, 3: Never Drinker) Family History of Diabetes: (1: Yes, 0: No) Diabetes
More details about dataset: The main dataset, without cleaning, is available at the following link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.ft8750v. The main article corresponding to the dataset can be found at: https://doi.org/10.11.../bmjopen-2018-021768
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This chart shows the rate of hospitalizations for short- term complications of diabetes for the most recent data year by age range and county. It also shows the 2017 objective by age range. This chart is based on one of three datasets related to the Prevention Agenda Tracking Indicators county level data posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2013-2017: New York State’s Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2017. The most recent year dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2017 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: http://www.health.ny.gov/prevention/prevention_agenda/2013-2017/ and https://www.health.ny.gov/PreventionAgendaDashboard. The "About" tab contains additional details concerning this dataset.
Facebook
TwitterPopulation-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Mouse models for streptozotocin (STZ) induced diabetes probably represent the most widely used systems for preclinical diabetes research, owing to the compound’s toxic effect on pancreatic ß-cells. However, a comprehensive view of pancreatic β-cell mass distribution subject to STZ administration is lacking. Previous assessments have largely relied on the extrapolation of stereological sections, which provide limited 3D-spatial and quantitative information. This data descriptor presents multiple ex vivo tomographic optical image data sets of the full β-cell mass distribution in mice subject to single high and multiple low doses of STZ administration, and in glycaemia recovered mice. The data further include information about structural features, such as individual islet β-cell volumes, spatial coordinates, and shape as well as signal intensities for both insulin and GLUT2. Together, they provide the most comprehensive anatomical record of the effects of STZ administration on the islet of Langerhans in mice. As such, this data descriptor may serve as reference material to facilitate the planning, use and (re)interpretation of this widely used disease model. Methods Animals, STZ administration and organ isolation The data presented in this data descriptor was acquired for Hahn et al., 2020, and all experiments were performed following the European Union guidelines for care and use of animals in research. All procedures were approved by the Animal Review Board at the Court of Appeal of Northern Norrland and of Northern Stockholm. Streptozotocin (STZ, Sigma-Aldrich) dissolved freshly in 0.1 M sodium citrate buffer (pH 4.5) was administered to 8-week-old male C57BL/6J mice by intraperitoneal (i.p.) injection, either as a single high dose (SHD, 150 mg/kg) or as multiple low doses (MLD, 50 mg/kg over 5 consecutive days). For data reliability, untreated control mice were compared with mice receiving an i.p. injection of the vehicle solution only (0.1 M sodium citrate, pH 4.5). No difference in BCM, islet number or blood glucose levels could be detected (see Hahn et al., 2020, suppl. Fig 2). Glucose measurements were regularly performed from tail vein blood with OneTouch (LifeScan, USA) or Accu-Chek (Roche, Switzerland) glucometers until death (see Table 1 and 2). Animals were killed by cervical dislocation and pancreata from diabetic groups of SHD or MLD and healthy control groups were isolated at 1-,2- and 3-weeks post administration of STZ (n=3-5, n=3-5 and n=5 respectively). Harvested pancreata were fixed in 4% paraformaldehyde (PFA, Sigma Aldrich) for 2 h, washed in 1x PBS and divided into the splenic, gastric, and duodenal lobular compartments (see also Fig. 2) before processing for whole mount immunohistochemistry and 3D imaging.
To delineate the long-term effects of hyperglycemia on GLUT2 expression, on β-cell function and islet size distribution generated data from islet transplantation experiments (see Fig. 2, data set 2) was performed as previously described. In short, pancreatic islets of Langerhans were obtained from healthy (normoglycemic) mice with the same genetic background via collagenase treatment. SHD-treated animals with the highest blood glucose levels 4 days post-STZ administration were then transplanted with 100 – 150 islets per animal into the anterior chamber of the eye under isoflurane anaesthesia to revert hyperglycemia. Once the SHD treated (n=8) and islet transplanted cohort (SHD+Tx, n=4) reached normoglycemic levels to the control (n=7), organs of all animal cohorts were harvested (28 days post-STZ administration, see above).
Pancreas processing, whole mount immunohistochemistry and tissue clearing.
Tissue processing, staining procedure, and preparation for OPT/LSFM imaging were performed as described. In brief, isolated and fixed pancreata were separated into the main lobes (SL, GL and DL respectively, see Fig. 2) permeabilized by freeze/thawing cycles, bleached to reduce autofluorescence, stained with primary and secondary antibodies, mounted in a cylinder of low melting point agarose, dehydrated with methanol and made transparent by matching the refractive index of proteins, lipids, and other cellular components with a 1:2 mixture of benzyl alcohol and benzyl benzoate (BABB), respectively. All specimens were blinded and randomized after organ harvest for all downstream processes. Primary antibody used was guinea pig anti-insulin (DAKO A0594, dilution 1:500), and secondary antibody was goat Alexa 594 anti-guinea pig (Molecular Probes, A11076, dilution 1:500). For co-expression assessments of insulin and GLUT2 (see Fig. 1 and Fig. 2, Dataset 2), pancreata were in addition to insulin labelled with primary rabbit anti-GLUT2 (Millipore, 07-1402-l, dilution 1:500) and secondary IRDye 680RD goat anti-rabbit (Licor, 926-68071, dilution 1:500). 3D imaging: Optical projection tomography (OPT) and Light sheet fluorescent microscopy (LSFM) OPT scanning of pancreatic specimen (Dataset 1) was performed as described using a Bioptonics 3001 OPT scanner (SkyScan, Belgium) with varying exposure times (see folder “Metadata for all groups” for exposure times) of Insulin staining (filter set “insulin”: Ex:560/20nm, Em.:610nm LP) and autofluorescence (filter set “anatomy”: Ex: 425/20nm, Em.:475nm LP). The image data was generated using SkyScanner 3001 (v1.3.13, SkyScan). Samples from co-expression experiments (Dataset 2) were scanned in our custom build Near Infrared-OPT setup using LabVIEW (v20.0f1) to retrieve image data. For comparison of intensities in 3D, all images in dataset 2 were generated using equal exposure times of Insulin staining (filter set: Ex: HQ 565/30 nm, Em: HQ 620/60 nm, exp. t = 4000 ms), GLUT2 staining (filter set: Ex: HQ 665/45 nm, Em: HQ 725/50 nm, exp. t = 8000 ms) and endogenous fluorescent anatomy (filter set: Ex: 425/60 nm Em: LP 480 nm, exp. t = 500 ms).
Additional high-resolution scans (Dataset 3) of volumes of interest from representative pancreata that were OPT scanned (see above) were reimaged in a LaVision biotech 2nd generation UltraMicroscope (LaVision BioTec BmbH, Germany) with a 1x Olympus objective (Olympuse PLAPO 2XC) coupled to an Olympus MVX10 zoom body, providing between 0.36x and 6.3x magnification with a lens corrected dipping cap MVPLAPO 2x DC DBE objective. Samples mounted in low melting point SeaPlaque Agarose (39346-81-1, Lonza) were trimmed in BABB to fit the LSFM sample holder. Scans were acquired using 6.3 x magnification, which rendered a pixel size of 0.48 µm in x and y dimensions. Depending on the scan locations, the exposure time was 120-300 ms, light sheet-width was between 10-20% with 3.78 µm thickness (NA of 0.14) with a z-step size of 5 µm. Image data acquisition was performed using ImSpectorPro (version 5.0.164, LaVision BioTec GmbH, Germany). Representative islets of Langerhans with different sizes and locations in the gland were chosen based on 3D rendered OPT data sets.
Image processing, reconstruction, and 3D volume rendering Insulin-based projection views retrieved from the Bioptonics 3001 scanner (Dataset 1) and volumetric assessments on β-cell volumes retrieved from the custom build NIR-OPT scanner (Dataset 2) were first processed with a contrast limited adaptive histogram equalization (CLAHE) algorithm, with a tile size of 64 x 64 to increase the signal-to-noise ratio for downstream islet segmentation. Secondly, a discrete Fourier transform alignment (DFTA) was performed to align opposing projection images to the same axis of rotation of a sample. However, for combined assessments of insulin and GLUT2 expression (Dataset 2) and analysis of the effect of STZ on GLUT2 staining intensity in β-cells, the CLAHE normalisation routine was not implemented. Reconstruction of OPT projection views to tomographic sections was performed using a filtered back projection algorithm in the NRecon software (V1.6.9.18, Bruker microCT, Belgium) with ring artefact correction set to 4. Resulting tomographic sections (*.bmp and .tif Datasets 1 & 2, Record B) and raw z-sectional images (.ome.tif, Dataset 3, Record B) generated with the Ultramicroscope II (LSFM) of each channel were converted with Imaris converter (Bitplane, UK) and the subsequential *.ims files of each channel for each sample were incorporated into one Imaris file. Individual insulin-positive islet volumes and lobular anatomies were quantified using an automated surfacing algorithm within the Imaris software (version 9.3.1, Bitplane, UK). Surface segmentation was performed using the ‘background subtraction’ function in Imaris with varying thresholding between samples. Individual threshold values for islet volume segmentation are displayed in Supplementary Table 1 and Supplementary Table 2 for Datasets 1 and 2, respectively. Surfaced islet volumes were arbitrarily categorized into small (<1 x 106 µm3), medium (1-5 x 106 µm3) and large (>5 x 106 µm3) islets of Langerhans as previously described. Volumes of 10 voxels or less were filtered out from quantification data sets (Datasets 1 and 2) to avoid inclusions of artefacts in the data analysis.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Easiest-diabetes dataset consists of more than 100 patient records. Aims at classifying diabetes.
2) Data utilization (1)Easiest-diabetes data has characteristics that: • Diagnose patients as diabetic based on 10 variable data such as age, gender, and blood pressure. (2) Easiest-diabetes data can be used to: • Predictive Modeling: A simple example for developing and testing predictive models for diabetes classification, helping you understand the feature selection and model tuning process. • Healthcare Analytics: We can perform basic healthcare analyzes to identify key factors associated with diabetes and provide insight into risk factors and potential preventative measures.
Facebook
TwitterThe Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. A downloadable file with this data is available for ease of download at: https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/3m9u-ws8e. For more information check out: http://www.health.ny.gov/statistics/sparcs/ or go to the “About” tab.
Facebook
TwitterSUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterThis Obesity and Diabetes Related Indicators dataset provides a subset of data (40 indicators) for the two topics: Obesity and Diabetes. The dataset includes percentage or rate for Cirrhosis/Diabetes and Obesity and Related Indicators, where available, for all counties, regions and state.
New York State Community Health Indicator Reports (CHIRS) were developed in 2012, and annually updated to provide data for over 300 health indicators, organized by 15 health topic and data for all counties, regions and state are presented in table format with links to trend graphs and maps (http://www.health.ny.gov/statistics/chac/indicators/).
Most recent county and state level data are provided. Multiple year combined data offers stable estimates for the burden and risk factors for these two health topics. For more information, check out: http://www.health.ny.gov/statistics/chac/indicators/ or go to the “About” tab.
Facebook
TwitterHealth, United States is an annual report on trends in health statistics, find more information at http://www.cdc.gov/nchs/hus.htm.
Facebook
TwitterNote: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org. This dataset contains the annual number of hospital discharges, crude hospitalization rates with corresponding 95% confidence intervals, and age-adjusted hospitalization rates with corresponding 95% confidence intervals, for the years 2000 – 2011, by Chicago U.S. Postal Service ZIP code or ZIP code aggregate. See the full description at http://bit.ly/Os5wnn.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*(Age-standardised incidence rates per 100,000 individuals per year with 95% confidence intervals. † For cells labeled as NA, 95% CIs could not be estimated as there was only 1 data point).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
"Explore detailed statistics on diabetes and obesity prevalence in U.S. states and counties, with a focus on both men and women. This dataset includes numeric data and percentages, shedding light on critical health indicators. The comprehensive insights derived from this dataset serve as a valuable resource for public health professionals, policymakers, and researchers to inform evidence-based interventions and strategies for addressing health disparities across regions."
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.
This dataset can be used for various analytical and machine learning purposes, such as: