2 datasets found
  1. Diabetes

    • kaggle.com
    zip
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamadreza Momeni (2023). Diabetes [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/diabetes
    Explore at:
    zip(11477 bytes)Available download formats
    Dataset updated
    Oct 8, 2023
    Authors
    Mohamadreza Momeni
    Description

    Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

    The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?

  2. Diabetes

    • kaggle.com
    zip
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhossein Jafarnezhad (2025). Diabetes [Dataset]. https://www.kaggle.com/datasets/amirjdai/diabetes
    Explore at:
    zip(11477 bytes)Available download formats
    Dataset updated
    Oct 22, 2025
    Authors
    Amirhossein Jafarnezhad
    Description

    Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

    The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamadreza Momeni (2023). Diabetes [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/diabetes
Organization logo

Diabetes

Real patient data to manipulate and to predict diabetes.

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
zip(11477 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
Mohamadreza Momeni
Description

Dataset Description: Several hundred rural African-American patients were included. The diabetes.csv file contains the raw data of all patients, including those with missing data. This can be used for descriptive statistics. The data dictionary to explain the columns can be found here: here and here

The Diabetes_Classification file was cleaned and manipulated. Any patient without a hemoglobin A1c was excluded. If their hemoglobin A1 c was 6.5 or greater they were labelled with diabetes = yes [column = "glyhb"]. Sixty patients out of 390 were found to be diabetic. A code book of the variables is included in one of the tabs. The goal is to use machine learning (classification algorithm) to predict diabetes based on demographic and laboratory variables. What are the strongest predictors? If you exclude glucose, how strong is the prediction?

Search
Clear search
Close search
Google apps
Main menu