6 datasets found
  1. f

    A comparative analysis of earlier studies.

    • plos.figshare.com
    xls
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). A comparative analysis of earlier studies. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.

  2. f

    Comparison of bagging ML methods for oversampled dataset.

    • plos.figshare.com
    xls
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Comparison of bagging ML methods for oversampled dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of bagging ML methods for oversampled dataset.

  3. f

    Comparison of bagging ML methods for undersampled dataset.

    • plos.figshare.com
    xls
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Comparison of bagging ML methods for undersampled dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of bagging ML methods for undersampled dataset.

  4. Highest accuracy of ML methods compared to previous works.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Highest accuracy of ML methods compared to previous works. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Highest accuracy of ML methods compared to previous works.

  5. f

    Dataset variables described (in Raw Form).

    • plos.figshare.com
    xls
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Dataset variables described (in Raw Form). [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.

  6. f

    Data_Sheet_1_Predicting Obesity in Adults Using Machine Learning Techniques:...

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Astuti Thamrin; Dian Sidik Arsyad; Hedi Kuswanto; Armin Lawi; Sudirman Nasir (2023). Data_Sheet_1_Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018.pdf [Dataset]. http://doi.org/10.3389/fnut.2021.669155.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Sri Astuti Thamrin; Dian Sidik Arsyad; Hedi Kuswanto; Armin Lawi; Sudirman Nasir
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Obesity is strongly associated with multiple risk factors. It is significantly contributing to an increased risk of chronic disease morbidity and mortality worldwide. There are various challenges to better understand the association between risk factors and the occurrence of obesity. The traditional regression approach limits analysis to a small number of predictors and imposes assumptions of independence and linearity. Machine Learning (ML) methods are an alternative that provide information with a unique approach to the application stage of data analysis on obesity. This study aims to assess the ability of ML methods, namely Logistic Regression, Classification and Regression Trees (CART), and Naïve Bayes to identify the presence of obesity using publicly available health data, using a novel approach with sophisticated ML methods to predict obesity as an attempt to go beyond traditional prediction models, and to compare the performance of three different methods. Meanwhile, the main objective of this study is to establish a set of risk factors for obesity in adults among the available study variables. Furthermore, we address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) to predict obesity status based on risk factors available in the dataset. This study indicates that the Logistic Regression method shows the highest performance. Nevertheless, kappa coefficients show only moderate concordance between predicted and measured obesity. Location, marital status, age groups, education, sweet drinks, fatty/oily foods, grilled foods, preserved foods, seasoning powders, soft/carbonated drinks, alcoholic drinks, mental emotional disorders, diagnosed hypertension, physical activity, smoking, and fruit and vegetables consumptions are significant in predicting obesity status in adults. Identifying these risk factors could inform health authorities in designing or modifying existing policies for better controlling chronic diseases especially in relation to risk factors associated with obesity. Moreover, applying ML methods on publicly available health data, such as Indonesian Basic Health Research (RISKESDAS) is a promising strategy to fill the gap for a more robust understanding of the associations of multiple risk factors in predicting health outcomes.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). A comparative analysis of earlier studies. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t001

A comparative analysis of earlier studies.

Related Article
Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Jan 18, 2024
Dataset provided by
PLOS ONE
Authors
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.

Search
Clear search
Close search
Google apps
Main menu