39 datasets found

d
Diabetes
catalog.data.gov
data.wprdc.org
+1more
Updated Mar 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County (2023). Diabetes [Dataset]. https://catalog.data.gov/dataset/diabetes
Explore at:
Dataset updated
Mar 14, 2023
Dataset provided by
Allegheny County
Description
These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Diabetes Dataset
kaggle.com
Updated Feb 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hosam Mhmd Ali (2024). Diabetes Dataset [Dataset]. https://www.kaggle.com/datasets/hosammhmdali/diabetes-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 22, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hosam Mhmd Ali
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.

Content Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction: Diabetes pedigree function Age: Age (years) Outcome: Class variable (0 or 1)

Inspiration Can you build a model (Machine learning or deep learning ) to accurately predict whether or not the patients in the dataset have diabetes or not?
Diabetes control is associated with environmental quality in the U.S.
catalog.data.gov
s.cnmilf.com
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Diabetes control is associated with environmental quality in the U.S. [Dataset]. https://catalog.data.gov/dataset/diabetes-control-is-associated-with-environmental-quality-in-the-u-s
Explore at:
Dataset updated
Jul 21, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Adults with Diabetes Per 100 (LGHC Indicator)
data.chhs.ca.gov
data.ca.gov
+1more
chart, csv, zip
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Adults with Diabetes Per 100 (LGHC Indicator) [Dataset]. https://data.chhs.ca.gov/dataset/adults-with-diabetes-per-100-lghc-indicator-23
Explore at:
csv(8574), zip, chartAvailable download formats
Dataset updated
Dec 10, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This is a source dataset for a Let's Get Healthy California indicator at "https://letsgethealthy.ca.gov/. This table displays the prevalence of diabetes in California. It contains data for California only. The data are from the California Behavioral Risk Factor Surveillance Survey (BRFSS). The California BRFSS is an annual cross-sectional health-related telephone survey that collects data about California residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. The BRFSS is conducted by Public Health Survey Research Program of California State University, Sacramento under contract from CDPH. This prevalence rate does not include pre-diabetes, or gestational diabetes. This is based on the question: "Has a doctor, or nurse or other health professional ever told you that you have diabetes?" The sample size for 2014 was 8,832. NOTE: Denominator data and weighting was taken from the California Department of Finance, not U.S. Census. Values may therefore differ from what has been published in the national BRFSS data tables by the Centers for Disease Control and Prevention (CDC) or other federal agencies.
Diabetes Prediction dataset
kaggle.com
zip
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yendoh Derek (2024). Diabetes Prediction dataset [Dataset]. https://www.kaggle.com/datasets/yendohderek19/diabetes-prediction-dataset
Explore at:
zip(9128 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Yendoh Derek
Description
Dataset

This dataset was created by Yendoh Derek

Released under Other (specified in description)

Contents
d
Diabetes + Hypertension (comorbidity)
catalog.data.gov
data.wprdc.org
+2more
Updated Mar 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diabetes + Hypertension (comorbidity) [Dataset]. https://catalog.data.gov/dataset/diabetes-hypertension-comorbidity
Explore at:
Dataset updated
Mar 14, 2023
Dataset provided by
Allegheny County
Description
This data set provides de-identified population data for diabetes and hypertension comorbidity prevalence in Allegheny County. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
diabetes dataset
kaggle.com
Updated Oct 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Belfeki (2024). diabetes dataset [Dataset]. https://www.kaggle.com/datasets/omaebelfeki/diabetes-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Omar Belfeki
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Omar Belfeki

Released under MIT

Contents
A
‘Diabetes.csv and arff’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Diabetes.csv and arff’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-diabetes-csv-and-arff-7417/bc8b6575/?iid=006-820&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Diabetes.csv and arff’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/amrikkatoch308/diabetescsv-and-arff on 14 February 2022.

--- No further description of dataset provided by original source ---

--- Original source retains full ownership of the source dataset ---
m
Type 2 diabetes and Psychosocial determinants of achieving clinical targets
data.mendeley.com
search.datacite.org
Updated Oct 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cameron Hurst (2020). Type 2 diabetes and Psychosocial determinants of achieving clinical targets [Dataset]. http://doi.org/10.17632/xhmjcfp2ym.1
Explore at:
Unique identifier
https://doi.org/10.17632/xhmjcfp2ym.1
Dataset updated
Oct 21, 2020
Authors
Cameron Hurst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is from a project investigating the role diabetes self-management, knowledge and management self-efficacy have on clinical targets among Type 2 diabetes patients in Thailand. Data have been de-identified. The patient data is in the file MontiFinal.csv, and a description of the variables contained therein are provided in DataDictionary.xls
f
Flagship Dataset of Type 2 Diabetes from the AI-READI Project
fairhub.io
Updated May 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI-READI Consortium (2024). Flagship Dataset of Type 2 Diabetes from the AI-READI Project [Dataset]. https://fairhub.io/datasets/1
Explore at:
Dataset updated
May 3, 2024
Dataset provided by
fairhub
Authors
AI-READI Consortium
Dataset funded by
National Institutes of Health
Description
This dataset contain data from 204 participants from the pilot period of the AI-READI project (July 19, 2023 to November 30, 2023). Data from multiple modalities are included. The data in this dataset contain no protected health information (PHI). Information related to the sex and race/ethnicity of the participants as well as medication used has also been removed. A detailed description of the dataset is available in the AI-READI documentation for v1.0.0 of the dataset at https://docs.aireadi.org
Data from: T1GDUJA: Glucose dataset of a patient with type 1 diabetes...
zenodo.org
investigacion.ujaen.es
+1more
Updated Jul 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Francisco Gaitán Guerrero; Juan Francisco Gaitán Guerrero; José Luis López Ruiz; José Luis López Ruiz; Carmen Martínez Cruz; Carmen Martínez Cruz; Macarena Espinilla Estévez; Macarena Espinilla Estévez (2024). T1GDUJA: Glucose dataset of a patient with type 1 diabetes mellitus [Dataset]. http://doi.org/10.5281/zenodo.11284018
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11284018
Dataset updated
Jul 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan Francisco Gaitán Guerrero; Juan Francisco Gaitán Guerrero; José Luis López Ruiz; José Luis López Ruiz; Carmen Martínez Cruz; Carmen Martínez Cruz; Macarena Espinilla Estévez; Macarena Espinilla Estévez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 21, 2023
Description
Background

Diabetes is a chronic disease that must be constantly monitored, especially in cases of type 1 and 2 diabetes mellitus. Nowadays, technology is helping society to provide innovative solutions in the field of health through sensors or smart devices. In this field, continuous glucose sensors are a huge advance in the development of artificial intelligence algorithms capable of predicting glucose values or obtaining any type of relevant information to improve the quality of patients' health. Unfortunately few datasets exist in this area. Therefore, this study aims to provide the scientific community with a dataset of a type 1 diabetic patient during the period 2023/09/10 and 2024/05/13 (226 days with data).

Data Records

The data are recorded in a single file entitled glucose_data.csv. This file establishes a Comma Separated Values (CSV) format.

The following characteristics can be found in each row of the dataset:

date: establishes the moment at which the glucose level was measured. The field is formatted as follows: "YYYYY-MM-DD HH:MM:SS.ssss" (UTC time zone).

sgv: glucose levels measured in mg/dL.

utcOffset: offset in minutes from the time zone where data were collected (Madrid GMT+1 and GMT+2).

The dataset contains a total of 41702 samples with an average of 185 samples per day. A summary of the samples per day can be found in the attached image.
Diabetes Prediction Dataset
kaggle.com
zip
Updated Nov 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dayam Nadeem (2024). Diabetes Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/dayamnadeem/diabetes-prediction-dataset
Explore at:
zip(751272 bytes)Available download formats
Dataset updated
Nov 18, 2024
Authors
Dayam Nadeem
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Dayam Nadeem

Released under MIT

Contents
Diabetes Dataset - Pima Indians
kaggle.com
Updated Jul 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ms. Nancy Al Aswad (2022). Diabetes Dataset - Pima Indians [Dataset]. https://www.kaggle.com/nancyalaswad90/review/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ms. Nancy Al Aswad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
What is Diabetes Dataset - Pima Indians Dataset?

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.2 From the data set in the (.csv) File We can find several variables, some of them are independent (several medical predictor variables) and only one target dependent variable (Outcome).

https://user-images.githubusercontent.com/36210723/179423454-754b0e67-3b28-461c-afdc-96537e65d93c.png" alt="178112363-36a719ea-2f2f-4131-9ec4-83f6bb2194f1">

.

Acknowledgments

When we use this dataset in our research, we credit the authors as :

License : CC0: Public Domain.

Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press, and it is published t to reuse in the google research dataset.

The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice
Diabetes self-management and social cognitive factors.csv
figshare.com
txt
Updated Aug 31, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chimwemwe Kwanjo Banda (2019). Diabetes self-management and social cognitive factors.csv [Dataset]. http://doi.org/10.6084/m9.figshare.9757076.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9757076.v1
Dataset updated
Aug 31, 2019
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Chimwemwe Kwanjo Banda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is raw data from a cross secional study of 510 people living with diabetes attending the Queen Elizabeth Central Hospital diabetes clinic. Ethical approval for the study was granted by the College of Medicine Research and Ethics Committee (Ref: P.08/17/229). The data were collected between November 2017 and May 2018 using an interviewer administered questionnaire that solicited data on participants demographic and clinical clinical characteristics, five social cognitive theory factors (self-efficacy, outcome expectations, knowledge, social support and barriers to self-management) and self-management (diet, exercise, foot care, medication, self-monitoring of blood glucose and smoking). The data were entered into a Microsoft Access database ten exported into Stata version 14.0 for cleaning and analysis.
Dataset Of Diabetes
kaggle.com
zip
Updated Dec 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels B0hr (2024). Dataset Of Diabetes [Dataset]. https://www.kaggle.com/datasets/nielsb0hr/dataset-of-diabetes
Explore at:
zip(15992 bytes)Available download formats
Dataset updated
Dec 25, 2024
Authors
Niels B0hr
Description
Dataset

This dataset was created by Niels B0hr

Contents
The association between environmental quality and diabetes in the U.S.
s.cnmilf.com
catalog.data.gov
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). The association between environmental quality and diabetes in the U.S. [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/the-association-between-environmental-quality-and-diabetes-in-the-u-s
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that _domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each _domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and _domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).
Diabetes Health Indicators
kaggle.com
Updated Mar 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siamak Tahmasbi (2025). Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/siamaktahmasbi/diabetes-health-indicators
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2025
Dataset provided by
Kaggle
Authors
Siamak Tahmasbi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.

Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.

The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.

Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.

Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.

Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Data from: T1GDUJA: Glucose dataset of a patient with type 1 diabetes...
zenodo.org
Updated Jul 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Francisco Gaitán Guerrero; Juan Francisco Gaitán Guerrero; José Luis López Ruiz; José Luis López Ruiz; Carmen Martínez Cruz; Carmen Martínez Cruz; Macarena Espinilla Estévez; Macarena Espinilla Estévez (2024). T1GDUJA: Glucose dataset of a patient with type 1 diabetes mellitus [Dataset]. http://doi.org/10.5281/zenodo.10713570
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10713570
Dataset updated
Jul 7, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan Francisco Gaitán Guerrero; Juan Francisco Gaitán Guerrero; José Luis López Ruiz; José Luis López Ruiz; Carmen Martínez Cruz; Carmen Martínez Cruz; Macarena Espinilla Estévez; Macarena Espinilla Estévez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 21, 2023
Description
Background

Diabetes is a chronic disease that must be constantly monitored, especially in cases of type 1 and 2 diabetes mellitus. Nowadays, technology is helping society to provide innovative solutions in the field of health through sensors or smart devices. In this field, continuous glucose sensors are a huge advance in the development of artificial intelligence algorithms capable of predicting glucose values or obtaining any type of relevant information to improve the quality of patients' health. Unfortunately few datasets exist in this area. Therefore, this study aims to provide the scientific community with a dataset of a type 1 diabetic patient during the period 2023/09/10 and 2024/02/26 (149 days with data).

Data Records

The data are recorded in a single file entitled glucose_data.csv. This file establishes a Comma Separated Values (CSV) format.

The following characteristics can be found in each row of the dataset:

date: establishes the moment at which the glucose level was measured. The field is formatted as follows: "YYYYY-MM-DD HH:MM:SS.ssss" (UTC time zone).

sgv: glucose levels measured in mg/dL.

utcOffset: offset in minutes from the time zone where data were collected (Madrid GMT+1 and GMT+2).

The dataset contains a total of 29137 samples with an average of 191 samples per day. A summary of the samples per day can be found in the attached image.
diabetes_csv
kaggle.com
zip
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajesh Sahani (2024). diabetes_csv [Dataset]. https://www.kaggle.com/datasets/rajesh83288/diabetes-csv/suggestions
Explore at:
zip(9128 bytes)Available download formats
Dataset updated
Jun 7, 2024
Authors
Rajesh Sahani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Rajesh Sahani

Released under Apache 2.0

Contents
Retrospective cohort study of a community-based primary care program's...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Oct 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Deaver (2023). Retrospective cohort study of a community-based primary care program's effects on pharmacotherapy quality in low-income Peruvians with type 2 diabetes and hypertension [Dataset]. http://doi.org/10.5061/dryad.76hdr7t1n
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.76hdr7t1n
Dataset updated
Oct 16, 2023
Dataset provided by
Asociacion Siempre Salud
Authors
John Deaver
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
A door-to-door survey was conducted to enumerate all household members by age and sex in a low-income community in Peru. 856 adults 35 years and older were eligible to participate in screening for type 2 diabetes and hypertension. 709 (83%) participated in screening. 130 (18.3%) were diagnosed with hypertension and/or type 2 diabetes of which 109 (84%) participated at program onset and 22 were added later from earlier non-participants in screening or program onset to form the cohort of 131 patients with diabetes and/or hypertension. The primary care program had components of the Chronic Care Model, community health workers, and freely accessible visits and medications. The program operated between September 2011 and May 2014, and consisted of two care periods (separated by a six-month hiatus), first a 10-month home-care period, then a 17-month clinic-care period. The dataset is two files corresponding to two exposures: the 27-month program overall (post- versus pre-) (N=262 observations, 131 pairs with patients as self-controls) and care period (clinic versus home), N=211 (109 home and 102 clinic observations, >131 because 80 patients participated in both care periods). Exposures were evaluated for their effects on guidelines-based pharmacotherapy standards: hypoglycemic and antihypertensive medications, low-dose aspirin, and first-line angiotensin converting enzyme inhibitor (ACEi) treatment of diabetes with elevated blood pressure. Methods From 2011 to 2014, data was collected prospectively, during weekly (home visits) or monthly (clinic visits), on paper encounter forms that were entered into Microsoft Excel as part of the standard operation of the community-based program. In January 2020, the University of Arizona institutional review board approved the use of the de-identified data for a study of the program's effects on clinical outcomes. Time-series data (fasting glucose and blood pressure) was collapsed on the median of monthly average fasting glucose and blood pressure values during the program (27 months) and the respective care periods, home (10 months) and clinic (17 months). Antihypertensive and hypoglycemic agents were collapsed on the highest dose ever received, angiotensin-converting enzyme inhibitors (ACEi) and aspirin on whether any dose was ever received, by treatment-eligible groups, and within program and care period time intervals. Retention in care was obtained by counting visits and elapsed months (from first to last patient encounters) during the program and care periods. Treatment-eligible groups were low-dose aspirin candidates (10-year cardiovascular disease (CVD) risk >=10% by the Framingham alternate model that uses clinical factors only, no laboratory factors; blood pressure (BP) treatment candidates (BP >=130/80 mm Hg if diabetic or >=140/90 mm Hg if non-diabetic); hypoglycemic agent candidates (patients with diabetes); and diabetic ACEi candidates (diabetes with BP >=130/80 mm Hg). Data has been transformed into two files corresponding to two exposures: 1) program, post- versus pre- (referent), N=262 observations; and 2) care period, clinic versus home (referent), N=211 observations. There are two data files in text (comma-delimited) format. Pre-post....csv contains the 262 observations for the program exposure. Care period....csv contains the 211 care period observations. Each file has a data dictionary also in comma-delimited format. "Pre-post data dict....csv" describes the variables in the program exposure study. "Care period data dict....csv" describes the variables in the care period exposure study.

Facebook

Twitter

Click to copy link

Link copied

Cite

Allegheny County (2023). Diabetes [Dataset]. https://catalog.data.gov/dataset/diabetes

Diabetes

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 14, 2023

Dataset provided by

Allegheny County

Description

These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.

Clear search

Close search

Google apps

Main menu

Diabetes

Diabetes Dataset

Diabetes control is associated with environmental quality in the U.S.

Adults with Diabetes Per 100 (LGHC Indicator)

Diabetes Prediction dataset

Dataset

Contents

Diabetes + Hypertension (comorbidity)

diabetes dataset

Dataset

Contents

‘Diabetes.csv and arff’ analyzed by Analyst-2

Type 2 diabetes and Psychosocial determinants of achieving clinical targets

Flagship Dataset of Type 2 Diabetes from the AI-READI Project

Data from: T1GDUJA: Glucose dataset of a patient with type 1 diabetes...

Background

Diabetes Prediction Dataset

Dataset

Contents

Diabetes Dataset - Pima Indians

What is Diabetes Dataset - Pima Indians Dataset?

Acknowledgments

The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

Diabetes self-management and social cognitive factors.csv

Dataset Of Diabetes

Dataset

Contents

The association between environmental quality and diabetes in the U.S.

Diabetes Health Indicators

Data from: T1GDUJA: Glucose dataset of a patient with type 1 diabetes...

Background

diabetes_csv

Dataset

Contents

Retrospective cohort study of a community-based primary care program's...

DiabetesSee More Versions

Diabetes