https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.
2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Pima Indians Diabetes Database’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/pima-indians-diabetes-database on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
The datasets consists of several medical predictor variables and one target variable, Outcome
. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.
Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?
--- Original source retains full ownership of the source dataset ---
Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University Johns Hopkins Road Laurel, MD 20707 (301) 953-6231 (c) Date received: 9 May 1990
Past Usage:
Smith,~J.~W., Everhart,~J.~E., Dickson,~W.~C., Knowler,~W.~C., & Johannes,~R.~S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In {\it Proceedings of the Symposium on Computer Applications and Medical Care} (pp. 261--265). IEEE Computer Society Press.
The diagnostic, binary-valued variable investigated is whether the patient shows signs of diabetes according to World Health Organization criteria (i.e., if the 2 hour post-load plasma glucose was at least 200 mg/dl at any survey examination or if found during routine medical care). The population lives near Phoenix, Arizona, USA.
Results: Their ADAP algorithm makes a real-valued prediction between 0 and 1. This was transformed into a binary decision using a cutoff of 0.448. Using 576 training instances, the sensitivity and specificity of their algorithm was 76% on the remaining 192 instances.
Relevant Information: Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ADAP is an adaptive learning routine that generates and executes digital analogs of perceptron-like devices. It is a unique algorithm; see the paper for details.
The dataset used in the paper is a medical dataset for diabetes detection.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Pima
The Pima dataset is a well-known data repository in the field of healthcare and machine learning. The dataset contains demographic, clinical and diagnostic characteristics of Pima Indian women and is primarily used to predict the onset of diabetes based on these attributes. Each data point includes information such as age, number of pregnancies, body mass index, blood pressure, and glucose concentration. Researchers and data scientists use the Pima dataset to… See the full description on the dataset page: https://huggingface.co/datasets/Genius-Society/Pima.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Pima Indians Diabetes Dataset Split
This directory contains a dataset split for Pima Indians Diabetes Database.
Mock Data
The mock data is a smaller dataset (10 rows) that is used to test the model components.
Private Data
The private data is the remaining data that is used to train the model.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Share key insights, awesome visualizations, or simply discuss advantages of data, any observed or known properties, challenges, problems, corrections, and any other helpful comments! Post and discuss recent published works that utilize this dataset (including your own). Any and all feedback is welcome and encouraged.
This dataset was created by Darshil06Shah
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameter values of the best fits of the obesity-related diabetes model to the glucose data of Pima Indian #1-#11.
This dataset was created by Angel Torres del Alamo
It contains the following files:
This dataset was created by SandeepN
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diabetes is a chronic disease, which is characterized by abnormally high blood sugar levels. It may affect various organs and tissues, and even lead to life-threatening complications. Accurate prediction of diabetes can significantly reduce its incidence. However, the current prediction methods struggle to accurately capture the essential characteristics of nonlinear data, and the black-box nature of these methods hampers its clinical application. To address these challenges, we propose KCCAM_DNN, a diabetes prediction method that integrates Kendall’s correlation coefficient and an attention mechanism within a deep neural network. In the KCCAM_DNN, Kendall’s correlation coefficient is initially employed for feature selection, which effectively filters out key features influencing diabetes prediction. For missing values in the data, polynomial regression is utilized for imputation, ensuring data completeness. Subsequently, we construct a deep neural network (KCCAM_DNN) based on the self-attention mechanism, which assigns greater weight to crucial features affecting diabetes and enhances the model’s predictive performance. Finally, we employ the SHAP model to analyze the impact of each feature on diabetes prediction, augmenting the model’s interpretability. Experimental results show that KCCAM_DNN exhibits superior performance on both PIMA Indian and LMCH diabetes datasets, achieving test accuracies of 99.090% and 99.333%, respectively, approximately 2% higher than the best existing method. These results suggest that KCCAM_DNN is proficient in diabetes prediction, providing a foundation for informed decision-making in the diagnosis and prevention of diabetes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There's a story behind every dataset and here's your opportunity to share yours.
Pima Indian Diabetes Data
Jerry Kurata
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIMA Indians diabetes dataset classification result.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Diabetics prediction using logistic regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandij/diabetes-dataset on 13 February 2022.
--- Dataset description provided by original source is as follows ---
The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.
We’ll be using Python and some of its popular data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We’ll use seaborn and matplotlib for visualizations. We will then import Logistic Regression algorithm from sklearn. This algorithm will help us build our classification model. Lastly, we will use joblib available in sklearn to save our model for future use.
--- Original source retains full ownership of the source dataset ---
This dataset was created by Jauhar Maknun
Released under Other (specified in description)
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Predict Diabetes dataset is based on data from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). It includes data exclusively from Pima Indian women aged 21 and older, and consists of 8 input variables (health-related measurements) and a target variable (Outcome) indicating the presence of diabetes.
2) Data Utilization (1) Characteristics of the Predict Diabetes Dataset: • The dataset contains key medical indicators closely related to diabetes diagnosis, such as glucose level, blood pressure, BMI, and insulin level, and is formatted in a clean and structured manner suitable for predictive modeling. • The target variable (Outcome) is binary, where 1 indicates the presence of diabetes and 0 indicates its absence.
(2) Applications of the Predict Diabetes Dataset: • Disease Prediction Model Development: The dataset can be used to build classification models that predict the presence of diabetes based on various health measurements. • Medical Data Analysis Practice: Suitable for educational use in medical AI and healthcare-related tasks focused on basic diagnostic prediction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.
This dataset was created by Nayan Kapri
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.
2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.