https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.
2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Pima Indians Diabetes Database’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/pima-indians-diabetes-database on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
The datasets consists of several medical predictor variables and one target variable, Outcome
. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.
Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Pima Indians Diabetes Dataset Split
This directory contains a dataset split for Pima Indians Diabetes Database.
Mock Data
The mock data is a smaller dataset (10 rows) that is used to test the model components.
Private Data
The private data is the remaining data that is used to train the model.
This dataset was created by Darshil06Shah
This dataset was created by Angel Torres del Alamo
It contains the following files:
This dataset was created by SandeepN
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Share key insights, awesome visualizations, or simply discuss advantages of data, any observed or known properties, challenges, problems, corrections, and any other helpful comments! Post and discuss recent published works that utilize this dataset (including your own). Any and all feedback is welcome and encouraged.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Pima
The Pima dataset is a well-known data repository in the field of healthcare and machine learning. The dataset contains demographic, clinical and diagnostic characteristics of Pima Indian women and is primarily used to predict the onset of diabetes based on these attributes. Each data point includes information such as age, number of pregnancies, body mass index, blood pressure, and glucose concentration. Researchers and data scientists use the Pima dataset to… See the full description on the dataset page: https://huggingface.co/datasets/Genius-Society/Pima.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
The datasets consists of several medical predictor variables and one target variable, Outcome
. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.
Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIMA Indians diabetes dataset classification result.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Diabetics prediction using logistic regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandij/diabetes-dataset on 13 February 2022.
--- Dataset description provided by original source is as follows ---
The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.
We’ll be using Python and some of its popular data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We’ll use seaborn and matplotlib for visualizations. We will then import Logistic Regression algorithm from sklearn. This algorithm will help us build our classification model. Lastly, we will use joblib available in sklearn to save our model for future use.
--- Original source retains full ownership of the source dataset ---
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction: Diabetes pedigree function Age: Age (years) Outcome: Class variable (0 or 1)
The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.
We’ll be using Python and some of its popular data science related packages. First of all, we will import pandas to read our data from a CSV file and manipulate it for further use. We will also use numpy to convert out data into a format suitable to feed our classification model. We’ll use seaborn and matplotlib for visualizations. We will then import Logistic Regression algorithm from sklearn. This algorithm will help us build our classification model. Lastly, we will use joblib available in sklearn to save our model for future use.
This dataset was created by Abhishek Kumar
Released under Other (specified in description)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diabetes prediction dataset classification result.
This dataset was created by Vivek Prasad Kushwaha
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pima Indians Diabetes (PID).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparing the average time performance, in seconds, of the GLocal-LS-SVM model to the global LS-SVM model, Glocal-SVM, and standard SVM applied to the Pima Indians diabetes dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The unprocessed dataset was acquired from UCI Machine Learning organisation. This dataset is preprocessed by me, originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to accurately predict whether or not, a patient has diabetes, based on multiple features included in the dataset. I've achieved an accuracy metric score of 92.86 % with Random Forest Classifier using this dataset. I've even developed a web-service Diabetes Prediction System using that trained model. You can explore the Exploratory Data Analysis notebook to better understand the data.
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus" in Proc. of the Symposium on Computer Applications and Medical Care, pp. 261-265. IEEE Computer Society Press. 1988.
Multiple models were trained on the original dataset but only Random Forest Classifier was able to score an accuracy metric of 78.57 % but with this new preprocessed dataset an accuracy metric score of 92.86 % was achieved. Can you build a machine learning model that can accurately predict whether a patient has diabetes or not? and can you achieve an accuracy metric score even higher than 92.86 % without overfitting the model?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparing the average error performance of the GLocal-LS-SVM and LS-SVM applied to the Pima Indians Diabetes dataset.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Pima Indians Diabetes Dataset is a tabular medical dataset for predicting diabetes (0: non-diabetic, 1: diabetic) based on health examination data of Pima Indian women in the United States.
2) Data Utilization (1) Pima Indians Diabetes Dataset has characteristics that: • Each row contains eight health indicators, including the number of pregnancies, blood sugar, diastolic blood pressure, arm triceps skin thickness, two-hour blood insulin, BMI, family history-based diabetes risk, and age, as well as binary outcomes (with or without diabetes). • The data is constructed without personal identification information and is widely used in medical diagnosis support and in the practice of various binary classification algorithms. (2) Pima Indians Diabetes Dataset can be used to: • Developing Diabetes Prediction Models: Using health indicator data, we can build a variety of machine learning-based diabetes prediction models such as logistic regression, decision tree, and neural networks. • Medical Data Interpretation and Variable Importance Analysis: It can be used in research to analyze the diabetes prediction contribution and clinical significance of each health variable by applying interpretation techniques such as SHAP.