31 datasets found

f
Performance comparison of machine learning models across accuracy, AUC, MCC,...
plos.figshare.com
xls
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seongil Han; Haemin Jung (2024). Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t005
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset.
f
GMSC dataset (IR: Imbalance Ratio).
plos.figshare.com
xls
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seongil Han; Haemin Jung (2024). GMSC dataset (IR: Imbalance Ratio). [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t001
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model’s decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE’s capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.
f
Under-sampled dataset.
plos.figshare.com
xls
Updated Dec 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seongil Han; Haemin Jung (2024). Under-sampled dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t003
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model’s decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE’s capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.
f
Increase in AUC, MCC, and F1 between oversampling and undersampling.
plos.figshare.com
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Increase in AUC, MCC, and F1 between oversampling and undersampling. [Dataset]. https://plos.figshare.com/articles/dataset/Increase_in_AUC_MCC_and_F1_between_oversampling_and_undersampling_/28118713
Explore at:
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t009
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Increase in AUC, MCC, and F1 between oversampling and undersampling.
f
Searching space for hyperparameters in Table 7.
plos.figshare.com
xls
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seongil Han; Haemin Jung (2024). Searching space for hyperparameters in Table 7. [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t006
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model’s decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE’s capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.
f
Evaluation of benchmark and optimal model performance with resampling...
plos.figshare.com
xls
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seongil Han; Haemin Jung (2024). Evaluation of benchmark and optimal model performance with resampling techniques. [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316454.t008
Dataset updated
Dec 31, 2024
Dataset provided by
PLOS ONE
Authors
Seongil Han; Haemin Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Evaluation of benchmark and optimal model performance with resampling techniques.
f
iProtDNA-SMOTE code.
plos.figshare.com
rar
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin (2025). iProtDNA-SMOTE code. [Dataset]. http://doi.org/10.1371/journal.pone.0320817.s003
Explore at:
rarAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320817.s003
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Protein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA binding residues. This approach effectively addresses the class imbalance issue in predicting protein-DNA binding sites by leveraging unbalanced graph data, thus enhancing model’s generalization and specificity. We trained the model on two datasets, TR646 and TR573, and conducted a series of experiments to evaluate its performance. The model achieved AUC values of 0.850, 0.896, and 0.858 on the independent test datasets TE46, TE129, and TE181, respectively. These results indicate that iProtDNA-SMOTE outperforms existing methods in terms of accuracy and generalization for predicting DNA binding sites, offering reliable and effective predictions to minimize errors. The model has been thoroughly validated for its ability to predict protein-DNA binding sites with high reliability and precision. For the convenience of the scientific community, the benchmark datasets and codes are publicly available at https://github.com/primrosehry/iProtDNA-SMOTE.
f
Sample size (n) of the full dataset generated under each class-imbalance...
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khurram Nadeem; Mehdi-Abderrahman Jabri (2023). Sample size (n) of the full dataset generated under each class-imbalance ratio (IR) to achieve a target balanced sample size (nb). [Dataset]. http://doi.org/10.1371/journal.pone.0280258.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0280258.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Khurram Nadeem; Mehdi-Abderrahman Jabri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample size (n) of the full dataset generated under each class-imbalance ratio (IR) to achieve a target balanced sample size (nb).
f
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...
plos.figshare.com
xls
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t010
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass classification with state of art on UNSW-NB15 and NSL-KDD.
f
Accuracy comparison with existing approaches for Binary Classification with...
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Accuracy comparison with existing approaches for Binary Classification with state of art on UNSW-NB15 and NSL-KDD. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t008
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy comparison with existing approaches for Binary Classification with state of art on UNSW-NB15 and NSL-KDD.
f
MCL-FWA-BILSTM and other existing approaches for multiclass classification...
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). MCL-FWA-BILSTM and other existing approaches for multiclass classification in both datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t014
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t014
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MCL-FWA-BILSTM and other existing approaches for multiclass classification in both datasets.
f
Performance Metrics of the UNSW-NB15 dataset on the proposed approach.
figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance Metrics of the UNSW-NB15 dataset on the proposed approach. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t007
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance Metrics of the UNSW-NB15 dataset on the proposed approach.
Data from: Multitask Modeling with Confidence Using Matrix Factorization and...
acs.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulf Norinder; Fredrik Svensson (2023). Multitask Modeling with Confidence Using Matrix Factorization and Conformal Prediction [Dataset]. http://doi.org/10.1021/acs.jcim.9b00027.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.9b00027.s001
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Ulf Norinder; Fredrik Svensson
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Multitask prediction of bioactivities is often faced with challenges relating to the sparsity of data and imbalance between different labels. We propose class conditional (Mondrian) conformal predictors using underlying Macau models as a novel approach for large scale bioactivity prediction. This approach handles both high degrees of missing data and label imbalances while still producing high quality predictive models. When applied to ten assay end points from PubChem, the models generated valid models with an efficiency of 74.0–80.1% at the 80% confidence level with similar performance both for the minority and majority class. Also when deleting progressively larger portions of the available data (0–80%) the performance of the models remained robust with only minor deterioration (reduction in efficiency between 5 and 10%). Compared to using Macau without conformal prediction the method presented here significantly improves the performance on imbalanced data sets.
f
Confusion matrix of UNSW-NB-15 dataset using MCL-FWA-BILSTM approach.
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Confusion matrix of UNSW-NB-15 dataset using MCL-FWA-BILSTM approach. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t006
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Confusion matrix of UNSW-NB-15 dataset using MCL-FWA-BILSTM approach.
f
Description of the NSL-KDD dataset attack categories.
plos.figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Description of the NSL-KDD dataset attack categories. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t002
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of the NSL-KDD dataset attack categories.
f
Data from: FT-GNN Tool for Bridging HRMS Features and Bioactivity:...
acs.figshare.com
xlsx
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fan Fan; Fu Liu; Qingmiao Yu; Ran Yi; Hongqiang Ren; Jinju Geng (2025). FT-GNN Tool for Bridging HRMS Features and Bioactivity: Uncovering Unidentified Estrogen Receptor Agonists in Sewage [Dataset]. http://doi.org/10.1021/acs.est.5c02324.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.est.5c02324.s002
Dataset updated
Apr 9, 2025
Dataset provided by
ACS Publications
Authors
Fan Fan; Fu Liu; Qingmiao Yu; Ran Yi; Hongqiang Ren; Jinju Geng
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Identifying primary estrogen receptor (ER) agonists in municipal sewage is essential for ensuring the health of aquatic environments. Given the complex and variable chemical composition of sewage, the predominant ER agonists remain unclear. High-resolution mass spectrometry (HRMS)-based models have been developed to predict compound bioactivity in complex matrices, but further optimization is needed to effectively bridge HRMS features with ER agonists. To address this challenge, an FT-GNN (fragmentation tree-based graph neural network) model was proposed. Given limited data and class imbalance, data augmentation was performed using model predictions within the applicability domain (AD) and oversampling technique (OTE). Model development results demonstrated that integrating the FT-GNN with data augmentation improved the balanced accuracy (bACC) value by 6%–31%. The developed model, with a high bACC to identify more true ER agonists, efficiently classified tens of thousands of unidentified HRMS features in sewage, reducing postprocessing workload in nontargeted screening. Analysis of ER agonist transformation during sewage treatment revealed the anaerobic stage as key to both their removal and formation. Estrogenic effect balance analysis suggests that α-E2 and 9,11-didehydroestriol may be two previously overlooked key ER agonists. Collectively, the development and application of the FT-GNN model are crucial advancements toward credible tracking and efficient control of estrogenic risks in water.
f
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
figshare.com
xls
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman (2024). Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model. [Dataset]. http://doi.org/10.1371/journal.pone.0302294.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302294.t005
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Arshad Hashmi; Omar M. Barukab; Ahmad Hamza Osman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.
f
Performance comparisons of iProtDNA-SMOTE and 5 competing predictors on...
plos.figshare.com
xls
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin (2025). Performance comparisons of iProtDNA-SMOTE and 5 competing predictors on TE129 under independent validation. [Dataset]. http://doi.org/10.1371/journal.pone.0320817.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320817.t003
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparisons of iProtDNA-SMOTE and 5 competing predictors on TE129 under independent validation.
f
Performance comparisons of iProtDNA-SMOTE and 4 competing predictors on...
plos.figshare.com
xls
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin (2025). Performance comparisons of iProtDNA-SMOTE and 4 competing predictors on TE181 under independent validation. [Dataset]. http://doi.org/10.1371/journal.pone.0320817.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320817.t004
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparisons of iProtDNA-SMOTE and 4 competing predictors on TE181 under independent validation.
f
Performance comparisons of iProtDNA-SMOTE and 6 competing predictors on TE46...
plos.figshare.com
xls
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin (2025). Performance comparisons of iProtDNA-SMOTE and 6 competing predictors on TE46 under independent validation. [Dataset]. http://doi.org/10.1371/journal.pone.0320817.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320817.t002
Dataset updated
May 15, 2025
Dataset provided by
PLOS ONE
Authors
Ruiyan Huang; Wangren Qiu; Xuan Xiao; Weizhong Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparisons of iProtDNA-SMOTE and 6 competing predictors on TE46 under independent validation.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seongil Han; Haemin Jung (2024). Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316454.t005

Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0316454.t005

Dataset updated

Dec 31, 2024

Dataset provided by

PLOS ONE

Authors

Seongil Han; Haemin Jung

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset.

Clear search

Close search

Google apps

Main menu

Performance comparison of machine learning models across accuracy, AUC, MCC,...

GMSC dataset (IR: Imbalance Ratio).

Under-sampled dataset.

Increase in AUC, MCC, and F1 between oversampling and undersampling.

Searching space for hyperparameters in Table 7.

Evaluation of benchmark and optimal model performance with resampling...

iProtDNA-SMOTE code.

Sample size (n) of the full dataset generated under each class-imbalance...

MCL-FWA-BILSTM accuracy comparison with existing approaches for multiclass...

Accuracy comparison with existing approaches for Binary Classification with...

MCL-FWA-BILSTM and other existing approaches for multiclass classification...

Performance Metrics of the UNSW-NB15 dataset on the proposed approach.

Data from: Multitask Modeling with Confidence Using Matrix Factorization and...

Confusion matrix of UNSW-NB-15 dataset using MCL-FWA-BILSTM approach.

Description of the NSL-KDD dataset attack categories.

Data from: FT-GNN Tool for Bridging HRMS Features and Bioactivity:...

Performance metrics of NSL-KDD dataset using MCL-FWA-BILSTM model.

Performance comparisons of iProtDNA-SMOTE and 5 competing predictors on...

Performance comparisons of iProtDNA-SMOTE and 4 competing predictors on...

Performance comparisons of iProtDNA-SMOTE and 6 competing predictors on TE46...

Performance comparison of machine learning models across accuracy, AUC, MCC, and F1 score on GMSC dataset.