6 datasets found

f
A comparative analysis of earlier studies.
plos.figshare.com
xls
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). A comparative analysis of earlier studies. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292100.t001
Dataset updated
Jan 18, 2024
Dataset provided by
PLOS ONE
Authors
Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.
f
Comparison of bagging ML methods for oversampled dataset.
plos.figshare.com
xls
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Comparison of bagging ML methods for oversampled dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326488.t002
Dataset updated
Jun 26, 2025
Dataset provided by
PLOS ONE
Authors
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of bagging ML methods for oversampled dataset.
f
Comparison of bagging ML methods for undersampled dataset.
plos.figshare.com
xls
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Comparison of bagging ML methods for undersampled dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326488.t003
Dataset updated
Jun 26, 2025
Dataset provided by
PLOS ONE
Authors
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of bagging ML methods for undersampled dataset.
Highest accuracy of ML methods compared to previous works.
plos.figshare.com
figshare.com
xls
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Highest accuracy of ML methods compared to previous works. [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326488.t004
Dataset updated
Jun 26, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Highest accuracy of ML methods compared to previous works.
f
Dataset variables described (in Raw Form).
plos.figshare.com
xls
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli (2025). Dataset variables described (in Raw Form). [Dataset]. http://doi.org/10.1371/journal.pone.0326488.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326488.t001
Dataset updated
Jun 26, 2025
Dataset provided by
PLOS ONE
Authors
Ekramul Haque Tusher; Mohd Arfian Ismail; Abdullah Akib; Lubna A. Gabralla; Ashraf Osman Ibrahim; Hafizan Mat Som; Muhammad Akmal Remli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.
f
Data_Sheet_1_Predicting Obesity in Adults Using Machine Learning Techniques:...
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sri Astuti Thamrin; Dian Sidik Arsyad; Hedi Kuswanto; Armin Lawi; Sudirman Nasir (2023). Data_Sheet_1_Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018.pdf [Dataset]. http://doi.org/10.3389/fnut.2021.669155.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnut.2021.669155.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Sri Astuti Thamrin; Dian Sidik Arsyad; Hedi Kuswanto; Armin Lawi; Sudirman Nasir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Obesity is strongly associated with multiple risk factors. It is significantly contributing to an increased risk of chronic disease morbidity and mortality worldwide. There are various challenges to better understand the association between risk factors and the occurrence of obesity. The traditional regression approach limits analysis to a small number of predictors and imposes assumptions of independence and linearity. Machine Learning (ML) methods are an alternative that provide information with a unique approach to the application stage of data analysis on obesity. This study aims to assess the ability of ML methods, namely Logistic Regression, Classification and Regression Trees (CART), and Naïve Bayes to identify the presence of obesity using publicly available health data, using a novel approach with sophisticated ML methods to predict obesity as an attempt to go beyond traditional prediction models, and to compare the performance of three different methods. Meanwhile, the main objective of this study is to establish a set of risk factors for obesity in adults among the available study variables. Furthermore, we address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) to predict obesity status based on risk factors available in the dataset. This study indicates that the Logistic Regression method shows the highest performance. Nevertheless, kappa coefficients show only moderate concordance between predicted and measured obesity. Location, marital status, age groups, education, sweet drinks, fatty/oily foods, grilled foods, preserved foods, seasoning powders, soft/carbonated drinks, alcoholic drinks, mental emotional disorders, diagnosed hypertension, physical activity, smoking, and fruit and vegetables consumptions are significant in predicting obesity status in adults. Identifying these risk factors could inform health authorities in designing or modifying existing policies for better controlling chronic diseases especially in relation to risk factors associated with obesity. Moreover, applying ML methods on publicly available health data, such as Indonesian Basic Health Research (RISKESDAS) is a promising strategy to fill the gap for a more robust understanding of the associations of multiple risk factors in predicting health outcomes.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh (2024). A comparative analysis of earlier studies. [Dataset]. http://doi.org/10.1371/journal.pone.0292100.t001

A comparative analysis of earlier studies.

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0292100.t001

Dataset updated

Jan 18, 2024

Dataset provided by

PLOS ONE

Authors

Praveen Talari; Bharathiraja N; Gaganpreet Kaur; Hani Alshahrani; Mana Saleh Al Reshan; Adel Sulaiman; Asadullah Shaikh

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model’s first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system’s result is to enhance the classifier’s performance in spotting illness early.

Clear search

Close search

Google apps

Main menu

A comparative analysis of earlier studies.

Comparison of bagging ML methods for oversampled dataset.

Comparison of bagging ML methods for undersampled dataset.

Highest accuracy of ML methods compared to previous works.

Dataset variables described (in Raw Form).

Data_Sheet_1_Predicting Obesity in Adults Using Machine Learning Techniques:...

A comparative analysis of earlier studies.