2 datasets found
  1. Pima Indians Diabetes Dataset

    • kaggle.com
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Jamal Tariq (2020). Pima Indians Diabetes Dataset [Dataset]. https://www.kaggle.com/jamaltariqcheema/pima-indians-diabetes-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Jamal Tariq
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The unprocessed dataset was acquired from UCI Machine Learning organisation. This dataset is preprocessed by me, originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to accurately predict whether or not, a patient has diabetes, based on multiple features included in the dataset. I've achieved an accuracy metric score of 92.86 % with Random Forest Classifier using this dataset. I've even developed a web-service Diabetes Prediction System using that trained model. You can explore the Exploratory Data Analysis notebook to better understand the data.

    Attributes Normal Value Range

    • Glucose: Glucose (< 140) = Normal, Glucose (140-200) = Pre-Diabetic, Glucose (> 200) = Diabetic
    • BloodPressure: B.P (< 60) = Below Normal, B.P (60-80) = Normal, B.P (80-90) = Stage 1 Hypertension, B.P (90-120) = Stage 2 Hypertension, B.P (> 120) = Hypertensive Crisis
    • SkinThickness: SkinThickness (< 10) = Below Normal, SkinThickness (10-30) = Normal, SkinThickness (> 30) = Above Normal
    • Insulin: Insulin (< 200) = Normal, Insulin (> 200) = Above Normal BMI: BMI (< 18.5) = Underweight, BMI (18.5-25) = Normal, BMI (25-30) = Overweight, BMI (> 30) = Obese

    Acknowledgements

    J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus" in Proc. of the Symposium on Computer Applications and Medical Care, pp. 261-265. IEEE Computer Society Press. 1988.

    Inspiration

    Multiple models were trained on the original dataset but only Random Forest Classifier was able to score an accuracy metric of 78.57 % but with this new preprocessed dataset an accuracy metric score of 92.86 % was achieved. Can you build a machine learning model that can accurately predict whether a patient has diabetes or not? and can you achieve an accuracy metric score even higher than 92.86 % without overfitting the model?

  2. life expectancy dataset

    • kaggle.com
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiran Shahi (2022). life expectancy dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/1980580
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kiran Shahi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    These datasets were collected to fulfil the requirement of University coursework.

    The complete source code and paper are available on GitHub. Click here.

    About Dataset

    These datasets contain the information of the World Development Indicator (WDI) provided by the world bank, the non-communicable mortality rate, the suicide rate and the number of health workforce data by the World Health Organization (WHO).

    DatasetDescription
    World Development IndicatorsThis dataset contains the data of 1444 development indicators for 2666 countries and country groups between the years 1960 to 2020. This dataset was downloaded from the world bank’s data hub.
    Health workforceThis dataset contains the health workforce information such as medical doctors (per 10000 population), number of medical doctors, number of Generalist medical practitioners, etc.
    Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70 (%)This dataset contains information on mortality caused by various non-communicable diseases such as cardiovascular disease (CVD), cancer, diabetes etc. We have used two files for this dataset. Separately for both males and females. This dataset was downloaded from the world bank’s databank.
    Suicide mortality rate (per 100,000 population)This data set contains information on the suicide mortality rate per 100,000 population. We have used two files for this dataset. Separately for both males and females. This dataset was downloaded from the world bank’s databank.

    Implementation

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Jamal Tariq (2020). Pima Indians Diabetes Dataset [Dataset]. https://www.kaggle.com/jamaltariqcheema/pima-indians-diabetes-dataset/code
Organization logo

Pima Indians Diabetes Dataset

Predict the chances of diabetes based on different features

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Jamal Tariq
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The unprocessed dataset was acquired from UCI Machine Learning organisation. This dataset is preprocessed by me, originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to accurately predict whether or not, a patient has diabetes, based on multiple features included in the dataset. I've achieved an accuracy metric score of 92.86 % with Random Forest Classifier using this dataset. I've even developed a web-service Diabetes Prediction System using that trained model. You can explore the Exploratory Data Analysis notebook to better understand the data.

Attributes Normal Value Range

  • Glucose: Glucose (< 140) = Normal, Glucose (140-200) = Pre-Diabetic, Glucose (> 200) = Diabetic
  • BloodPressure: B.P (< 60) = Below Normal, B.P (60-80) = Normal, B.P (80-90) = Stage 1 Hypertension, B.P (90-120) = Stage 2 Hypertension, B.P (> 120) = Hypertensive Crisis
  • SkinThickness: SkinThickness (< 10) = Below Normal, SkinThickness (10-30) = Normal, SkinThickness (> 30) = Above Normal
  • Insulin: Insulin (< 200) = Normal, Insulin (> 200) = Above Normal BMI: BMI (< 18.5) = Underweight, BMI (18.5-25) = Normal, BMI (25-30) = Overweight, BMI (> 30) = Obese

Acknowledgements

J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus" in Proc. of the Symposium on Computer Applications and Medical Care, pp. 261-265. IEEE Computer Society Press. 1988.

Inspiration

Multiple models were trained on the original dataset but only Random Forest Classifier was able to score an accuracy metric of 78.57 % but with this new preprocessed dataset an accuracy metric score of 92.86 % was achieved. Can you build a machine learning model that can accurately predict whether a patient has diabetes or not? and can you achieve an accuracy metric score even higher than 92.86 % without overfitting the model?

Search
Clear search
Close search
Google apps
Main menu