2 datasets found

Diabetes
kaggle.com
zip
Updated Oct 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamadreza Momeni (2023). Diabetes [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/diabetes
Explore at:
zip(11477 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
Mohamadreza Momeni
Description
Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?
Diabetes
kaggle.com
zip
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhossein Jafarnezhad (2025). Diabetes [Dataset]. https://www.kaggle.com/datasets/amirjdai/diabetes
Explore at:
zip(11477 bytes)Available download formats
Dataset updated
Oct 22, 2025
Authors
Amirhossein Jafarnezhad
Description
Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamadreza Momeni (2023). Diabetes [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/diabetes

Diabetes

Real patient data to manipulate and to predict diabetes.

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(11477 bytes)Available download formats

Dataset updated

Oct 8, 2023

Authors

Mohamadreza Momeni

Description

Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?

Clear search

Close search

Google apps

Main menu