60 datasets found
  1. c

    Pima Indians Diabetes Dataset

    • cubig.ai
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Pima Indians Diabetes Dataset [Dataset]. https://cubig.ai/store/products/488/pima-indians-diabetes-dataset
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.

    2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.

  2. A

    ‘Pima Indians Diabetes Database’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Pima Indians Diabetes Database’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pima-indians-diabetes-database-607a/d2070de9/?iid=003-553&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Pima Indians Diabetes Database’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/pima-indians-diabetes-database on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

    Content

    The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

    Acknowledgements

    Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

    Inspiration

    Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

    --- Original source retains full ownership of the source dataset ---

  3. Pima Diabetes Database

    • kaggle.com
    Updated Jan 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabh Malhotra (2020). Pima Diabetes Database [Dataset]. https://www.kaggle.com/rishabhm76/pima-diabetes-database/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rishabh Malhotra
    Description

    Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 (c) Date received: 9 May 1990

    Past Usage:

    Smith,~J.~W., Everhart,~J.~E., Dickson,~W.~C., Knowler,~W.~C., & Johannes,~R.~S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In {\it Proceedings of the Symposium on Computer Applications and Medical Care} (pp. 261--265). IEEE Computer Society Press.

    The diagnostic, binary-valued variable investigated is whether the patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care). The population lives near Phoenix, Arizona, USA.

    Results: Their ADAP algorithm makes a real-valued prediction between 0 and 1. This was transformed into a binary decision using a cutoff of 0.448. Using 576 training instances, the sensitivity and specificity of their algorithm was 76% on the remaining 192 instances.

    Relevant Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details.

  4. t

    Pima Indian Diabetes - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Pima Indian Diabetes - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/pima-indian-diabetes
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper is a medical dataset for diabetes detection.

  5. h

    Data from: Pima

    • huggingface.co
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pima [Dataset]. https://huggingface.co/datasets/Genius-Society/Pima
    Explore at:
    Dataset updated
    Sep 25, 2023
    Dataset authored and provided by
    Genius Society
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Pima

    The Pima dataset is a well-known data repository in the field of healthcare and machine learning. The dataset contains demographic, clinical and diagnostic characteristics of Pima Indian women and is primarily used to predict the onset of diabetes based on these attributes. Each data point includes information such as age, number of pregnancies, body mass index, blood pressure, and glucose concentration. Researchers and data scientists use the Pima dataset to… See the full description on the dataset page: https://huggingface.co/datasets/Genius-Society/Pima.

  6. h

    pima-indians-diabetes-database-partitions

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khoa Nguyen (2025). pima-indians-diabetes-database-partitions [Dataset]. https://huggingface.co/datasets/khoaguin/pima-indians-diabetes-database-partitions
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Khoa Nguyen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Pima Indians Diabetes Dataset Split

    This directory contains a dataset split for Pima Indians Diabetes Database.

      Mock Data
    

    The mock data is a smaller dataset (10 rows) that is used to test the model components.

      Private Data
    

    The private data is the remaining data that is used to train the model.

  7. [Global Dataset] Pima Indians Diabetes

    • kaggle.com
    zip
    Updated Apr 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manas Garg (2021). [Global Dataset] Pima Indians Diabetes [Dataset]. https://www.kaggle.com/gargmanas/pima-indians-diabetes
    Explore at:
    zip(9001 bytes)Available download formats
    Dataset updated
    Apr 30, 2021
    Authors
    Manas Garg
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    Share key insights, awesome visualizations, or simply discuss advantages of data, any observed or known properties, challenges, problems, corrections, and any other helpful comments! Post and discuss recent published works that utilize this dataset (including your own). Any and all feedback is welcome and encouraged.

  8. Pima Indians Diabetes Database

    • kaggle.com
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darshil06Shah (2023). Pima Indians Diabetes Database [Dataset]. https://www.kaggle.com/datasets/darshil06shah/pima-indians-diabetes-database
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Darshil06Shah
    Description

    Dataset

    This dataset was created by Darshil06Shah

    Contents

  9. Parameter values of the best fits of the obesity-related diabetes model to...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boya Yang; Jiaxu Li; Michael J. Haller; Desmond A. Schatz; Libin Rong (2023). Parameter values of the best fits of the obesity-related diabetes model to the glucose data of Pima Indian #1-#11. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010914.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Boya Yang; Jiaxu Li; Michael J. Haller; Desmond A. Schatz; Libin Rong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameter values of the best fits of the obesity-related diabetes model to the glucose data of Pima Indian #1-#11.

  10. pima-indians-diabetes-database

    • kaggle.com
    zip
    Updated Nov 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angel Torres del Alamo (2020). pima-indians-diabetes-database [Dataset]. https://www.kaggle.com/angeltorresdelalamo/pimaindiansdiabetesdatabase
    Explore at:
    zip(9128 bytes)Available download formats
    Dataset updated
    Nov 6, 2020
    Authors
    Angel Torres del Alamo
    Description

    Dataset

    This dataset was created by Angel Torres del Alamo

    Contents

    It contains the following files:

  11. Pima-Indians-diabetes

    • kaggle.com
    zip
    Updated Sep 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SandeepN (2021). Pima-Indians-diabetes [Dataset]. https://www.kaggle.com/sandeep2812/pimaindiansdiabetes
    Explore at:
    zip(9003 bytes)Available download formats
    Dataset updated
    Sep 19, 2021
    Authors
    SandeepN
    Description

    Dataset

    This dataset was created by SandeepN

    Contents

  12. f

    Description of the PIMA Indian diabetes dataset.

    • plos.figshare.com
    xls
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaobo Qi; Yachen Lu; Ying Shi; Hui Qi; Lifang Ren (2024). Description of the PIMA Indian diabetes dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0306090.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Xiaobo Qi; Yachen Lu; Ying Shi; Hui Qi; Lifang Ren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes is a chronic disease, which is characterized by abnormally high blood sugar levels. It may affect various organs and tissues, and even lead to life-threatening complications. Accurate prediction of diabetes can significantly reduce its incidence. However, the current prediction methods struggle to accurately capture the essential characteristics of nonlinear data, and the black-box nature of these methods hampers its clinical application. To address these challenges, we propose KCCAM_DNN, a diabetes prediction method that integrates Kendall’s correlation coefficient and an attention mechanism within a deep neural network. In the KCCAM_DNN, Kendall’s correlation coefficient is initially employed for feature selection, which effectively filters out key features influencing diabetes prediction. For missing values in the data, polynomial regression is utilized for imputation, ensuring data completeness. Subsequently, we construct a deep neural network (KCCAM_DNN) based on the self-attention mechanism, which assigns greater weight to crucial features affecting diabetes and enhances the model’s predictive performance. Finally, we employ the SHAP model to analyze the impact of each feature on diabetes prediction, augmenting the model’s interpretability. Experimental results show that KCCAM_DNN exhibits superior performance on both PIMA Indian and LMCH diabetes datasets, achieving test accuracies of 99.090% and 99.333%, respectively, approximately 2% higher than the best existing method. These results suggest that KCCAM_DNN is proficient in diabetes prediction, providing a foundation for informed decision-making in the diagnosis and prevention of diabetes.

  13. f

    Confusion matrix.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.

  14. Pima Indian Diabetes Data

    • kaggle.com
    Updated Oct 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Silion (2017). Pima Indian Diabetes Data [Dataset]. https://www.kaggle.com/danielsilion/pimadata/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Daniel Silion
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    Pima Indian Diabetes Data

    Acknowledgements

    Jerry Kurata

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  15. PIMA Indians diabetes dataset classification result.

    • plos.figshare.com
    xls
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nur Farahaina Idris; Mohd Arfian Ismail; Mohd Izham Mohd Jaya; Ashraf Osman Ibrahim; Anas W. Abulfaraj; Faisal Binzagr (2024). PIMA Indians diabetes dataset classification result. [Dataset]. http://doi.org/10.1371/journal.pone.0302595.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 8, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nur Farahaina Idris; Mohd Arfian Ismail; Mohd Izham Mohd Jaya; Ashraf Osman Ibrahim; Anas W. Abulfaraj; Faisal Binzagr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIMA Indians diabetes dataset classification result.

  16. A

    ‘Diabetics prediction using logistic regression’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Diabetics prediction using logistic regression’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-diabetics-prediction-using-logistic-regression-7c04/latest
    Explore at:
    Dataset updated
    Mar 24, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Diabetics prediction using logistic regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandij/diabetes-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

    We’ll be using Python and some of its popular data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We’ll use seaborn and matplotlib for visualizations. We will then import Logistic Regression algorithm from sklearn. This algorithm will help us build our classification model. Lastly, we will use joblib available in sklearn to save our model for future use.

    --- Original source retains full ownership of the source dataset ---

  17. Pima Indian Diabetes

    • kaggle.com
    Updated Sep 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jauhar Maknun (2024). Pima Indian Diabetes [Dataset]. https://www.kaggle.com/datasets/jojohar/pima-indian-diabetes/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jauhar Maknun
    Description

    Dataset

    This dataset was created by Jauhar Maknun

    Released under Other (specified in description)

    Contents

  18. c

    Predict Diabetes Dataset

    • cubig.ai
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Predict Diabetes Dataset [Dataset]. https://cubig.ai/store/products/245/predict-diabetes-dataset
    Explore at:
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Predict Diabetes dataset is based on data from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). It includes data exclusively from Pima Indian women aged 21 and older, and consists of 8 input variables (health-related measurements) and a target variable (Outcome) indicating the presence of diabetes.

    2) Data Utilization (1) Characteristics of the Predict Diabetes Dataset: • The dataset contains key medical indicators closely related to diabetes diagnosis, such as glucose level, blood pressure, BMI, and insulin level, and is formatted in a clean and structured manner suitable for predictive modeling. • The target variable (Outcome) is binary, where 1 indicates the presence of diabetes and 0 indicates its absence.

    (2) Applications of the Predict Diabetes Dataset: • Disease Prediction Model Development: The dataset can be used to build classification models that predict the presence of diabetes based on various health measurements. • Medical Data Analysis Practice: Suitable for educational use in medical AI and healthcare-related tasks focused on basic diagnostic prediction.

  19. f

    Performance measure.

    • plos.figshare.com
    xls
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). Performance measure. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.

  20. pima-indians-diabetes

    • kaggle.com
    Updated Nov 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nayan Kapri (2019). pima-indians-diabetes [Dataset]. https://www.kaggle.com/nrkapri/pimaindiansdiabetes/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nayan Kapri
    Description

    Dataset

    This dataset was created by Nayan Kapri

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CUBIG (2025). Pima Indians Diabetes Dataset [Dataset]. https://cubig.ai/store/products/488/pima-indians-diabetes-dataset

Pima Indians Diabetes Dataset

Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
CUBIG
License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description

1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.

2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.

Search
Clear search
Close search
Google apps
Main menu