Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose ļ³126 mg/dL, hemoglobin A1c (HbA1c) of ļ³6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (ā„126 mg/dL) and/or HbA1C levels (ā„6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that _domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each _domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and _domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 267456 series, with data for years 2000 - 2000 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (not all combinations are available): Geography (199 items: Canada; Newfoundland and Labrador; Health and Community Services St. John's Region; Newfoundland and Labrador (Peer group H); Health and Community Services Eastern Region; Newfoundland and Labrador (Peer group D) ...), Age group (14 items: Total; 12 years and over; 12-19 years; 12-14 years; 15-19 years ...), Sex (3 items: Both sexes; Males; Females ...), Diabetes (4 items: Total population for the variable diabetes; Without diabetes; Diabetes; not stated; With diabetes ...), Characteristics (8 items: Number of persons; High 95% confidence interval - number of persons; Coefficient of variation for number of persons; Low 95% confidence interval - number of persons ...).
T1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID ā Unique identifier of the patient. Format: LIB19XXXX.
Sex ā Sex of the patient. Values: F (for female), masculine (for male)
Birth_year ā Year of birth of the patient. Format: YYYY.
Initial_measurement_date ā Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date ā Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures ā Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements ā Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date ā Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date ā Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters ā Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics ā Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID ā Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date ā Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time ā Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement ā Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID ā Unique identifier of the patient. Format: LIB19XXXX.
Reception_date ā Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name ā Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value ā Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID ā Unique identifier of the patient. Format: LIB19XXXX.
Code ā ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description ā ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
This data set provides de-identified population data for diabetes and hypertension comorbidity prevalence in Allegheny County. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (ā„126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (ā„6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Germany DE: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 6.900 % in 2021. This records an increase from the previous number of 5.300 % for 2011. Germany DE: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 6.100 % from Dec 2011 (Median) to 2021, with 2 observations. The data reached an all-time high of 6.900 % in 2021 and a record low of 5.300 % in 2011. Germany DE: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Databaseās Germany ā Table DE.World Bank.WDI: Social: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes. It is calculated by adjusting to a standard population age-structure.;International Diabetes Federation, Diabetes Atlas.;Weighted average;
This chart shows the rate of hospitalizations for short- term complications of diabetes for the most recent data year by age range and county. It also shows the 2017 objective by age range. This chart is based on one of three datasets related to the Prevention Agenda Tracking Indicators county level data posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2013-2017: New York Stateās Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2017. The most recent year dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2017 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: http://www.health.ny.gov/prevention/prevention_agenda/2013-2017/ and https://www.health.ny.gov/PreventionAgendaDashboard. The "About" tab contains additional details concerning this dataset.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction ⢠The Diabetes Clinical Dataset(100k rows) Dataset is a detailed dataset that contains health and demographic data for 100,000 people. It contains information on gender, age, location, race, high blood pressure, heart disease, smoking history, body mass index (BMI), glycated hemoglobin (HbA1c), blood sugar, and diabetes.
2) Data Utilization (1) Diabetes Clinical Dataset(100k rows) Dataset has characteristics that: ⢠This dataset consists of 100,000 items, each of which represents an individual's health and demographic data related to diabetes research. (2) Diabetes Clinical Dataset(100k rows) Dataset can be used to: ⢠Predictive modeling : Builds a model to predict the likelihood of diabetes based on demographics and health-related features. ⢠Health Analysis : Analyze the correlation between diabetes and various health indicators (e.g., BMI, HbA1c levels). ⢠Demographic study : investigate the distribution of diabetes in various demographic groups and regions. ⢠Public Health Study : Identify Diabetes Risk Factors and Aim for Interventions in High-Risk Groups.
This Obesity and Diabetes Related Indicators dataset provides a subset of data (40 indicators) for the two topics: Obesity and Diabetes. The dataset includes percentage or rate for Cirrhosis/Diabetes and Obesity and Related Indicators, where available, for all counties, regions and state.
New York State Community Health Indicator Reports (CHIRS) were developed in 2012, and annually updated to provide data for over 300 health indicators, organized by 15 health topic and data for all counties, regions and state are presented in table format with links to trend graphs and maps (http://www.health.ny.gov/statistics/chac/indicators/).
Most recent county and state level data are provided. Multiple year combined data offers stable estimates for the burden and risk factors for these two health topics. For more information, check out: http://www.health.ny.gov/statistics/chac/indicators/ or go to the āAboutā tab.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of diabetes mellitus in persons (aged 17+). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to diabetes mellitus in persons (aged 17+).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOAās population (aged 17+) with diabetes mellitus was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practiceās catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOAās population with diabetes mellitus was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with depression, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have diabetes mellitusB) the NUMBER of people within that MSOA who are estimated to have diabetes mellitusAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have diabetes mellitus, compared to other MSOAs. In other words, those are areas where itās estimated a large number of people suffer from diabetes mellitus, and where those people make up a large percentage of the population, indicating there is a real issue with diabetes mellitus within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 ā 31st March 2019 was used in preference to data for the financial year 1st April 2019 ā 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the āHealth and wellbeing statistics (GP-level, England): Missing data and potential outliersā dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the āLevels of obesity, inactivity and associated illnesses: Summary (England)ā dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practiceās catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of diabetes mellitus, rather than interpreting the boundaries between areas as āhardā boundaries that mark definite divisions between areas with differing levels of diabetes mellitus.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the āLevels of obesity, inactivity and associated illnesses: Summary (England)ā dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright Ā© 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright Ā© 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:Ā© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright Ā© 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description: This dataset is designed for predicting the risk of diabetes and contains 25,000 simulated observations, each with important medical indicators. It is ideal for training machine learning models, statistical exploration, or educational purposes in health analytics and prediction.
Included Features: - age: Patient's age. - gender: Patient's gender (Male/Female). - bmi: Body Mass Index. - bmi_category: BMI classification (Underweight, Normal, Overweight, Obesity). - systolic & diastolic: Systolic and diastolic blood pressure. - blood_pressure_category: Blood pressure classification (Normal, Hypertension, etc.). - cholesterol: Cholesterol level. - cholesterol_category: Cholesterol classification (Normal, High, etc.). - glucose: Blood glucose level. - diabetes_probability: Probability of developing diabetes (calculated based on multiple factors).
Potential Uses: - Exploring relationships between factors like BMI, blood pressure, and diabetes. - Building and testing logistic regression or classification models for diabetes prediction. - Educational purposes in data analysis and predictive modeling.
Dataset Size: - 10.000 rows - 11 columns
Source: This dataset was artificially generated using statistical methods to reflect realistic values for medical indicators. It does not represent real patient data!
This subset of the community health indicator report data will not be updated. A dataset containing all of the community health indicators is now available. To view the latest community health obesity and diabetes related indicators, see the featured content section. This Obesity and Diabetes Related Indicators dataset provides a subset of data (40 indicators) for the two topics: Obesity and Diabetes. The dataset includes percentage or rate for Cirrhosis/Diabetes and Obesity and Related Indicators, where available, for all counties, regions and state.
New York State Community Health Indicator Reports (CHIRS) were developed in 2012, and annually updated to provide data for over 300 health indicators, organized by 15 health topic and data for all counties, regions and state are presented in table format with links to trend graphs and maps.
Most recent county and state level data are provided. Multiple year combined data offers stable estimates for the burden and risk factors for these two health topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AimTo report the point prevalence, deaths and disability-adjusted-life-years (DALYs) due to type 2 diabetes and its attributable risk factors in 204 countries and territories during the period 1990-2019.MethodsWe used the data of the Global Burden of Disease (GBD) Study 2019 to report number and age-standardised rates per 100 000 population of type 2 diabetes. Estimates were reported with 95% uncertainty intervals (UIs).ResultsIn 2019, the global age-standardised point prevalence and death rates for type 2 diabetes were 5282.9 and 18.5 per 100 000, an increase of 49% and 10.8%, respectively, since 1990. Moreover, the global age-standardised DALY rate in 2019 was 801.5 per 100 000, an increase of 27.6% since 1990. In 2019, the global point prevalence of type 2 diabetes was slightly higher in males and increased with age up to the 75-79 age group, decreasing across the remaining age groups. American Samoa [19876.8] had the highest age-standardised point prevalence rates of type 2 diabetes in 2019. Generally, the burden of type 2 diabetes decreased with increasing SDI (Socio-demographic Index). Globally, high body mass index [51.9%], ambient particulate matter pollution [13.6%] and smoking [9.9%] had the three highest proportions of attributable DALYs.ConclusionLow and middle-income countries have the highest burden and greater investment in type 2 diabetes prevention is needed. In addition, accurate data on type 2 diabetes needs to be collected by the health systems of all countries to allow better monitoring and evaluation of population-level interventions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The unprocessed dataset was acquired from UCI Machine Learning organisation. This dataset is preprocessed by me, originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to accurately predict whether or not, a patient has diabetes, based on multiple features included in the dataset. I've achieved an accuracy metric score of 92.86 % with Random Forest Classifier using this dataset. I've even developed a web-service Diabetes Prediction System using that trained model. You can explore the Exploratory Data Analysis notebook to better understand the data.
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus" in Proc. of the Symposium on Computer Applications and Medical Care, pp. 261-265. IEEE Computer Society Press. 1988.
Multiple models were trained on the original dataset but only Random Forest Classifier was able to score an accuracy metric of 78.57 % but with this new preprocessed dataset an accuracy metric score of 92.86 % was achieved. Can you build a machine learning model that can accurately predict whether a patient has diabetes or not? and can you achieve an accuracy metric score even higher than 92.86 % without overfitting the model?
https://www.insight.hdrhub.org/https://www.insight.hdrhub.org/
There are two data sets of eye scans available. The first of these is a set fundus images of which the are c. 7.0 million. The other is a set of OCT scans of which there are c. 440, 000.
This dataset contains routine clinical ophthalmology data for every patient who have been seen at Queen Elizabeth Hospital and the Birmingham, Solihull and Black Country Diabetic Retinopathy screening program at University Hospitals Birmingham NHS Foundation Trust, with longitudinal follow-up for 15 years. Key data included are: ⢠Total number of patients. ⢠Demographic information (including age, sex and ethnicity) ⢠Past ocular history ⢠Intravitreal injections ⢠Length of time since eye diagnosis ⢠Visual acuity ⢠The national screening diabetic grade category (seven categories from R0M0 to R3M1) ⢠Reason for sight and severe sight impairment
Geography University Hospitals Birmingham is set within the West Midlands and it has a catchment population of circa 5.9million. The region includes a diverse ethnic, and socio-economic mix, with a higher than UK average of minority ethnic groups. It has a large number of elderly residents but is the youngest population in the UK. There are particularly high rates of diabetes, physical inactivity, obesity, and smoking.
Data source: Ophthalmology department at Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom. The Birmingham, Solihull and Black Country Data Set, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom. They manage over 200,000 patients, with longitudinal follow-up up to 15 years, making this the largest urban diabetic screening scheme in Europe.
Pathway: The routine secondary care follow-up in the hospital eye services for all ophthalmic diseases at Queen Elizabeth Hospital. The Birmingham, Solihull and Black Country dataset is representative of the patient pathway for community screening and grading of diabetic eye disease.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 6720 series, with data for years 1994 - 1998 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (5 items: Territories; Northwest Territories; Northwest Territories including Nunavut; Yukon ...), Age group (14 items: Total; 12 years and over;12-14 years;12-19 years;15-19 years ...), Sex (3 items: Both sexes; Females; Males ...), Diabetes (4 items: Total population for the variable diabetes; Without diabetes; With diabetes; Diabetes; not stated ...), Characteristics (8 items: Number of persons; Coefficient of variation for number of persons; High 95% confidence interval - number of persons; Low 95% confidence interval - number of persons ...).
https://www.insight.hdrhub.org/https://www.insight.hdrhub.org/
Background: Diabetes mellitus affects over 3.9 million people in the United Kingdom (UK), with over 2.6 million people in England alone. More than 1 million people living with diabetes are acutely admitted to hospital due to complications of their illness every year. Complications include Diabetic emergencies such as Diabetic Comas, Hypoglycaemia, Diabetic ketoacidosis and Diabetic Hyperosmolar Hyperglycaemic State. Diabetic retinopathy (DR) is a common microvascular complication of type 1 and type 2 diabetes and remains a major cause of vision loss and blindness in those of working age. This dataset includes acute all diabetic admissions to University Hospitals Birmingham NHS Trust from 2000 onwards with linked eye data including the national screening diabetic grade category (seven categories from R0M0 to R3M1) from the Birmingham, Solihull and Black Country DR screening program (a member of the National Health Service (NHS) Diabetic Eye Screening Programme) and the University Hospitals Birmingham NHS Trust Ophthalmology clinic at Queen Elizabeth Hospital, Birmingham .
Geography: The West Midlands has a population of 5.9 million. The region includes a diverse ethnic, and socio-economic mix, with a higher than UK average of minority ethnic groups. It has a large number of elderly residents but is the youngest population in the UK. There are particularly high rates of diabetes, physical inactivity, obesity, and smoking.
Data sources:
1. The Birmingham, Solihull and Black Country Data Set, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom. They manage over 200,000 diabetic patients, with longitudinal follow-up up to 15 years, making this the largest urban diabetic eye screening scheme in Europe.
2. The Electronic Health Records held at University Hospitals Birmingham NHS Foundation Trust is one of the largest NHS Trusts in England, providing direct acute services and specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds and 100 ITU beds. UHB runs a fully electronic healthcare record both for systemic disease as well as the Ophthalmology records.
Scope: All hospitalised patients admitted to UHB with a diabetes related health concern from 2000 onwards. Longitudinal and individually linked with their diabetic eye care from primary screening data and secondary care ophthalmology data including ⢠Demographic information (including age, sex and ethnicity) ⢠Diabetes status ⢠Diabetes type ⢠Length of time since diagnosis of diabetes ⢠Visual acuity ⢠The national screening diabetic screening grade category (seven categories from R0M0 to R3M1) ⢠Diabetic eye clinical features ⢠Reason for sight and severe sight impairment ⢠ICD-10 and SNOMED-CT codes pertaining to diabetes ⢠Diagnosis for the acute/emergency admission ⢠Co-morbid conditions ⢠Medications ⢠Outcome
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
This synthetic dataset was created as part of the translation and implementation of the algorithm used by the CNAM to build the top diabetes (link to the description sheet of the algorithm).
The Python and SAS versions adapted by the HDH cover synthetic data for the years 2018-2019 but can be extended to other years. The CNAM source program was developed in SAS and runs on data from 2015 to 2019.
The objective of the algorithm mentioned above is to target people in care for diabetes in the main base of the NSDS in order to create the āTop Diabetesā of the pathology mapping created and maintained by the CNAM (version G8).
The implementation of the top diabetes algorithm required the mobilization of synthetic (fictitious) tables and variables.
-merge annual tables into a single table for ER_PRS_F, ER_ETE_F, ER_PHA_F,
Data/SNDS community.
The algorithm used by the CNAM to construct the top diabetes: (source version (CNAM), Python version and SAS version (HDH)) (https://www.health-data-hub.fr/library-open-algorithms-health/algorithm-to-build-the-top-diabete-of-mapping).
Recruit people, in a wide variety of fields, to work in Quebec, which is looking to recruit in the region.
The programmes operate on the synthetic data of the HDH with some adaptations: This dataset was generated using the scheme of the 2019 NSDS main database tables. - Target audience: -the conversion of the date format to yymmdd10.
Patient identification is based on the targeting of specific medicines and/or ALD and/or hospitalisation in MCO.
-the renaming of NUM_ENQ to BEN_NIR_PSA, The mapping algorithms aim to maximize specificity (not sensitivity), i.e. to ensure the absence of non-diabetics among the targeted patients.
The implementation of the algorithm requires the mobilisation of the following tables and variables (the required history is indicated in the corresponding box): Patients with less than 3 dispensings of specific drugs, who do not have ALD and who have not been hospitalized within 5 years for diabetes are not retained.
The programs adapted in SAS and Python run on synthetic data from the years 2018 and 2019. The CNAM source code (in SAS) was designed to work on data from the years 2015 to 2019.
https://gitlab.com/healthdatahub/boas/cnam/top-diabete/-/raw/main/Tables_et_variables_du_SNDS_n%C3%A9cessaires.png?ref_type=heads" alt="enter image description here" title="enter image title here"> the lack of medical consistency, the lack of updating of annual changes, an evolutionary table scheme that can be incomplete and imperfect.
This programme does not include an analysis of the estimated items of expenditure reimbursed by Health Insurance.
The algorithm identifies prevalent patients with diabetes in a given year (2019). It does not determine the exact date of onset of diabetes in the base.
The use of synthetic data, although useful for manipulating NSDS data, has limitations:
More information on the use of the database in the context of the top diabetes programmes (CNAM) on the GitLab repository of the programmes (link of the GitLab repository).
Contact point: dir.donnees-SNDS@health-data-hub.fr
On Gitlab (make a ticket or merge-request)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: Type 2 diabetes (T2D) is a multifactorial complex chronic disease with a high prevalence worldwide, and Type 2 diabetes patients with different comorbidities often present multiple phenotypes in the clinic. Thus, there is a pressing need to improve understanding of the complexity of the clinical Type 2 diabetes population to help identify more accurate disease subtypes for personalized treatment.Methods: Here, utilizing the traditional Chinese medicine (TCM) clinical electronic medical records (EMRs) of 2137 Type 2 diabetes inpatients, we followed a heterogeneous medical record network (HEMnet) framework to construct heterogeneous medical record networks by integrating the clinical features from the electronic medical records, molecular interaction networks and domain knowledge.Results: Of the 2137 Type 2 diabetes patients, 1347 were male (63.03%), and 790 were female (36.97%). Using the HEMnet method, we obtained eight non-overlapping patient subgroups. For example, in H3, Poria, Astragali Radix, Glycyrrhizae Radix et Rhizoma, Cinnamomi Ramulus, and Liriopes Radix were identified as significant botanical drugs. Cardiovascular diseases (CVDs) were found to be significant comorbidities. Furthermore, enrichment analysis showed that there were six overlapping pathways and eight overlapping Gene Ontology terms among the herbs, comorbidities, and Type 2 diabetes in H3.Discussion: Our results demonstrate that identification of the Type 2 diabetes subgroup based on the HEMnet method can provide important guidance for the clinical use of herbal prescriptions and that this method can be used for other complex diseases.
High-quality real-world datasets are essential for advancing data-driven approaches in type 1 diabetes (T1D) management, including personalized therapy design, digital twin systems, and glucose prediction models. However, progress in this area has been limited by the scarcity of publicly available datasets that offer detailed and comprehensive patient data. To address this gap, we present AZT1D, a dataset containing data collected from 25 individuals with T1D on automated insulin delivery (AID) systems. AZT1D includes continuous glucose monitoring (CGM) data, insulin pump and insulin administration data, carbohydrate intake, and device mode (regular, sleep, and exercise) obtained over 6 to 8 weeks for each patient. Notably, the dataset provides granular details on bolus insulin delivery (i.e., total dose, bolus type, correction-specific amounts) features that are rarely found in existing datasets. By offering rich, naturalistic data, AZT1D supports a wide range of artificial intelligence and machine learning applications aimed at improving clinical decision making and individualized care in T1D.
Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose ļ³126 mg/dL, hemoglobin A1c (HbA1c) of ļ³6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (ā„126 mg/dL) and/or HbA1C levels (ā„6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that _domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each _domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and _domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).