60 datasets found

c
Pima Indians Diabetes Dataset
cubig.ai
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Pima Indians Diabetes Dataset [Dataset]. https://cubig.ai/store/products/488/pima-indians-diabetes-dataset
Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.

2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.
A
‘Pima Indians Diabetes Database’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Pima Indians Diabetes Database’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pima-indians-diabetes-database-607a/d2070de9/?iid=003-553&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Pima Indians Diabetes Database’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/pima-indians-diabetes-database on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

Content

The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

Acknowledgements

Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

Inspiration

Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

--- Original source retains full ownership of the source dataset ---
Pima Diabetes Database
kaggle.com
Updated Jan 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishabh Malhotra (2020). Pima Diabetes Database [Dataset]. https://www.kaggle.com/rishabhm76/pima-diabetes-database/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rishabh Malhotra
Description
Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 (c) Date received: 9 May 1990

Past Usage:

Smith,~J.~W., Everhart,~J.~E., Dickson,~W.~C., Knowler,~W.~C., & Johannes,~R.~S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In {\it Proceedings of the Symposium on Computer Applications and Medical Care} (pp. 261--265). IEEE Computer Society Press.

The diagnostic, binary-valued variable investigated is whether the patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care). The population lives near Phoenix, Arizona, USA.

Results: Their ADAP algorithm makes a real-valued prediction between 0 and 1. This was transformed into a binary decision using a cutoff of 0.448. Using 576 training instances, the sensitivity and specificity of their algorithm was 76% on the remaining 192 instances.

Relevant Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details.
t
Pima Indian Diabetes - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Pima Indian Diabetes - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/pima-indian-diabetes
Explore at:
Dataset updated
Dec 16, 2024
Description
The dataset used in the paper is a medical dataset for diabetes detection.
h
Data from: Pima
huggingface.co
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pima [Dataset]. https://huggingface.co/datasets/Genius-Society/Pima
Explore at:
Dataset updated
Sep 25, 2023
Dataset authored and provided by
Genius Society
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Pima

The Pima dataset is a well-known data repository in the field of healthcare and machine learning. The dataset contains demographic, clinical and diagnostic characteristics of Pima Indian women and is primarily used to predict the onset of diabetes based on these attributes. Each data point includes information such as age, number of pregnancies, body mass index, blood pressure, and glucose concentration. Researchers and data scientists use the Pima dataset to… See the full description on the dataset page: https://huggingface.co/datasets/Genius-Society/Pima.
h
pima-indians-diabetes-database-partitions
huggingface.co
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khoa Nguyen (2025). pima-indians-diabetes-database-partitions [Dataset]. https://huggingface.co/datasets/khoaguin/pima-indians-diabetes-database-partitions
Explore at:
Dataset updated
May 28, 2025
Authors
Khoa Nguyen
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Pima Indians Diabetes Dataset Split

This directory contains a dataset split for Pima Indians Diabetes Database.

Mock Data

The mock data is a smaller dataset (10 rows) that is used to test the model components.

Private Data

The private data is the remaining data that is used to train the model.
[Global Dataset] Pima Indians Diabetes
kaggle.com
zip
Updated Apr 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manas Garg (2021). [Global Dataset] Pima Indians Diabetes [Dataset]. https://www.kaggle.com/gargmanas/pima-indians-diabetes
Explore at:
zip(9001 bytes)Available download formats
Dataset updated
Apr 30, 2021
Authors
Manas Garg
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

Share key insights, awesome visualizations, or simply discuss advantages of data, any observed or known properties, challenges, problems, corrections, and any other helpful comments! Post and discuss recent published works that utilize this dataset (including your own). Any and all feedback is welcome and encouraged.
Pima Indians Diabetes Database
kaggle.com
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darshil06Shah (2023). Pima Indians Diabetes Database [Dataset]. https://www.kaggle.com/datasets/darshil06shah/pima-indians-diabetes-database
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Darshil06Shah
Description
Dataset

This dataset was created by Darshil06Shah

Contents
Parameter values of the best fits of the obesity-related diabetes model to...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boya Yang; Jiaxu Li; Michael J. Haller; Desmond A. Schatz; Libin Rong (2023). Parameter values of the best fits of the obesity-related diabetes model to the glucose data of Pima Indian #1-#11. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010914.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010914.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Boya Yang; Jiaxu Li; Michael J. Haller; Desmond A. Schatz; Libin Rong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Parameter values of the best fits of the obesity-related diabetes model to the glucose data of Pima Indian #1-#11.
pima-indians-diabetes-database
kaggle.com
zip
Updated Nov 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angel Torres del Alamo (2020). pima-indians-diabetes-database [Dataset]. https://www.kaggle.com/angeltorresdelalamo/pimaindiansdiabetesdatabase
Explore at:
zip(9128 bytes)Available download formats
Dataset updated
Nov 6, 2020
Authors
Angel Torres del Alamo
Description
Dataset

This dataset was created by Angel Torres del Alamo

Contents

It contains the following files:
Pima-Indians-diabetes
kaggle.com
zip
Updated Sep 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SandeepN (2021). Pima-Indians-diabetes [Dataset]. https://www.kaggle.com/sandeep2812/pimaindiansdiabetes
Explore at:
zip(9003 bytes)Available download formats
Dataset updated
Sep 19, 2021
Authors
SandeepN
Description
Dataset

This dataset was created by SandeepN

Contents
f
Description of the PIMA Indian diabetes dataset.
plos.figshare.com
xls
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaobo Qi; Yachen Lu; Ying Shi; Hui Qi; Lifang Ren (2024). Description of the PIMA Indian diabetes dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0306090.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0306090.t004
Dataset updated
Jul 2, 2024
Dataset provided by
PLOS ONE
Authors
Xiaobo Qi; Yachen Lu; Ying Shi; Hui Qi; Lifang Ren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Diabetes is a chronic disease, which is characterized by abnormally high blood sugar levels. It may affect various organs and tissues, and even lead to life-threatening complications. Accurate prediction of diabetes can significantly reduce its incidence. However, the current prediction methods struggle to accurately capture the essential characteristics of nonlinear data, and the black-box nature of these methods hampers its clinical application. To address these challenges, we propose KCCAM_DNN, a diabetes prediction method that integrates Kendall’s correlation coefficient and an attention mechanism within a deep neural network. In the KCCAM_DNN, Kendall’s correlation coefficient is initially employed for feature selection, which effectively filters out key features influencing diabetes prediction. For missing values in the data, polynomial regression is utilized for imputation, ensuring data completeness. Subsequently, we construct a deep neural network (KCCAM_DNN) based on the self-attention mechanism, which assigns greater weight to crucial features affecting diabetes and enhances the model’s predictive performance. Finally, we employ the SHAP model to analyze the impact of each feature on diabetes prediction, augmenting the model’s interpretability. Experimental results show that KCCAM_DNN exhibits superior performance on both PIMA Indian and LMCH diabetes datasets, achieving test accuracies of 99.090% and 99.333%, respectively, approximately 2% higher than the best existing method. These results suggest that KCCAM_DNN is proficient in diabetes prediction, providing a foundation for informed decision-making in the diagnosis and prevention of diabetes.
f
Confusion matrix.
plos.figshare.com
figshare.com
xls
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292100.t002
Dataset updated
Jan 18, 2024
Dataset provided by
PLOS ONE
Authors
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.
Pima Indian Diabetes Data
kaggle.com
Updated Oct 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Silion (2017). Pima Indian Diabetes Data [Dataset]. https://www.kaggle.com/danielsilion/pimadata/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Daniel Silion
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

Pima Indian Diabetes Data

Acknowledgements

Jerry Kurata

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
PIMA Indians diabetes dataset classification result.
plos.figshare.com
xls
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nur Farahaina Idris; Mohd Arfian Ismail; Mohd Izham Mohd Jaya; Ashraf Osman Ibrahim; Anas W. Abulfaraj; Faisal Binzagr (2024). PIMA Indians diabetes dataset classification result. [Dataset]. http://doi.org/10.1371/journal.pone.0302595.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302595.t001
Dataset updated
May 8, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Nur Farahaina Idris; Mohd Arfian Ismail; Mohd Izham Mohd Jaya; Ashraf Osman Ibrahim; Anas W. Abulfaraj; Faisal Binzagr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PIMA Indians diabetes dataset classification result.
A
‘Diabetics prediction using logistic regression’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Diabetics prediction using logistic regression’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-diabetics-prediction-using-logistic-regression-7c04/latest
Explore at:
Dataset updated
Mar 24, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Diabetics prediction using logistic regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandij/diabetes-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

We’ll be using Python and some of its popular data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We’ll use seaborn and matplotlib for visualizations. We will then import Logistic Regression algorithm from sklearn. This algorithm will help us build our classification model. Lastly, we will use joblib available in sklearn to save our model for future use.

--- Original source retains full ownership of the source dataset ---
Pima Indian Diabetes
kaggle.com
Updated Sep 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jauhar Maknun (2024). Pima Indian Diabetes [Dataset]. https://www.kaggle.com/datasets/jojohar/pima-indian-diabetes/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jauhar Maknun
Description
Dataset

This dataset was created by Jauhar Maknun

Released under Other (specified in description)

Contents
c
Predict Diabetes Dataset
cubig.ai
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Predict Diabetes Dataset [Dataset]. https://cubig.ai/store/products/245/predict-diabetes-dataset
Explore at:
Dataset updated
May 20, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Predict Diabetes dataset is based on data from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). It includes data exclusively from Pima Indian women aged 21 and older, and consists of 8 input variables (health-related measurements) and a target variable (Outcome) indicating the presence of diabetes.

2) Data Utilization (1) Characteristics of the Predict Diabetes Dataset: • The dataset contains key medical indicators closely related to diabetes diagnosis, such as glucose level, blood pressure, BMI, and insulin level, and is formatted in a clean and structured manner suitable for predictive modeling. • The target variable (Outcome) is binary, where 1 indicates the presence of diabetes and 0 indicates its absence.

(2) Applications of the Predict Diabetes Dataset: • Disease Prediction Model Development: The dataset can be used to build classification models that predict the presence of diabetes based on various health measurements. • Medical Data Analysis Practice: Suitable for educational use in medical AI and healthcare-related tasks focused on basic diagnostic prediction.
f
Performance measure.
plos.figshare.com
xls
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). Performance measure. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292100.t003
Dataset updated
Jan 18, 2024
Dataset provided by
PLOS ONE
Authors
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.
pima-indians-diabetes
kaggle.com
Updated Nov 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nayan Kapri (2019). pima-indians-diabetes [Dataset]. https://www.kaggle.com/nrkapri/pimaindiansdiabetes/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nayan Kapri
Description
Dataset

This dataset was created by Nayan Kapri

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

CUBIG (2025). Pima Indians Diabetes Dataset [Dataset]. https://cubig.ai/store/products/488/pima-indians-diabetes-dataset

Pima Indians Diabetes Dataset

Explore at:

Dataset updated

Jun 22, 2025

Dataset authored and provided by

CUBIG

License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique

Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy

Description

1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.

2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.

Clear search

Close search

Google apps

Main menu

Pima Indians Diabetes Dataset

‘Pima Indians Diabetes Database’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Pima Diabetes Database

Pima Indian Diabetes - Dataset - LDM

Data from: Pima

pima-indians-diabetes-database-partitions

[Global Dataset] Pima Indians Diabetes

Context

Pima Indians Diabetes Database

Dataset

Contents

Parameter values of the best fits of the obesity-related diabetes model to...

pima-indians-diabetes-database

Dataset

Contents

Pima-Indians-diabetes

Dataset

Contents

Description of the PIMA Indian diabetes dataset.

Confusion matrix.

Pima Indian Diabetes Data

Context

Content

Acknowledgements

Inspiration

PIMA Indians diabetes dataset classification result.

‘Diabetics prediction using logistic regression’ analyzed by Analyst-2

Pima Indian Diabetes

Dataset

Contents

Predict Diabetes Dataset

Performance measure.

pima-indians-diabetes

Dataset

Contents

Pima Indians Diabetes Dataset