7 datasets found

f
A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk...
plos.figshare.com
xls
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317396.t009
Dataset updated
Feb 10, 2025
Dataset provided by
PLOS ONE
Authors
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier.
f
A comparison of the RN-SMOTE, SMOTE-Tomek Link, SMOTE-ENN, and the proposed...
plos.figshare.com
xls
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). A comparison of the RN-SMOTE, SMOTE-Tomek Link, SMOTE-ENN, and the proposed 1CRN-SMOTE methods on the ILPD and QSAR datasets is presented, based on various classification metrics using the Random Forest classifier. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317396.t007
Dataset updated
Feb 10, 2025
Dataset provided by
PLOS ONE
Authors
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A comparison of the RN-SMOTE, SMOTE-Tomek Link, SMOTE-ENN, and the proposed 1CRN-SMOTE methods on the ILPD and QSAR datasets is presented, based on various classification metrics using the Random Forest classifier.
f
S5 Dataset -
plos.figshare.com
xlsx
Updated Dec 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JiaMing Gong; MingGang Dong (2024). S5 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0311133.s005
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311133.s005
Dataset updated
Dec 13, 2024
Dataset provided by
PLOS ONE
Authors
JiaMing Gong; MingGang Dong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Online imbalanced learning is an emerging topic that combines the challenges of class imbalance and concept drift. However, current works account for issues of class imbalance and concept drift. And only few works have considered these issues simultaneously. To this end, this paper proposes an entropy-based dynamic ensemble classification algorithm (EDAC) to consider data streams with class imbalance and concept drift simultaneously. First, to address the problem of imbalanced learning in training data chunks arriving at different times, EDAC adopts an entropy-based balanced strategy. It divides the data chunks into multiple balanced sample pairs based on the differences in the information entropy between classes in the sample data chunk. Additionally, we propose a density-based sampling method to improve the accuracy of classifying minority class samples into high quality samples and common samples via the density of similar samples. In this manner high quality and common samples are randomly selected for training the classifier. Finally, to solve the issue of concept drift, EDAC designs and implements an ensemble classifier that uses a self-feedback strategy to determine the initial weight of the classifier by adjusting the weight of the sub-classifier according to the performance on the arrived data chunks. The experimental results demonstrate that EDAC outperforms five state-of-the-art algorithms considering four synthetic and one real-world data streams.
f
The selected explanatory variables.
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seyed Iman Mohammadpour; Majid Khedmati; Mohammad Javad Hassan Zada (2023). The selected explanatory variables. [Dataset]. http://doi.org/10.1371/journal.pone.0281901.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0281901.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Seyed Iman Mohammadpour; Majid Khedmati; Mohammad Javad Hassan Zada
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
While the cost of road traffic fatalities in the U.S. surpasses $240 billion a year, the availability of high-resolution datasets allows meticulous investigation of the contributing factors to crash severity. In this paper, the dataset for Trucks Involved in Fatal Accidents in 2010 (TIFA 2010) is utilized to classify the truck-involved crash severity where there exist different issues including missing values, imbalanced classes, and high dimensionality. First, a decision tree-based algorithm, the Synthetic Minority Oversampling Technique (SMOTE), and the Random Forest (RF) feature importance approach are employed for missing value imputation, minority class oversampling, and dimensionality reduction, respectively. Afterward, a variety of classification algorithms, including RF, K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), Gradient-Boosted Decision Trees (GBDT), and Support Vector Machine (SVM) are developed to reveal the influence of the introduced data preprocessing framework on the output quality of ML classifiers. The results show that the GBDT model outperforms all the other competing algorithms for the non-preprocessed crash data based on the G-mean performance measure, but the RF makes the most accurate prediction for the treated dataset. This finding indicates that after the feature selection is conducted to alleviate the computational cost of the machine learning algorithms, bagging (bootstrap aggregating) of decision trees in RF leads to a better model rather than boosting them via GBDT. Besides, the adopted feature importance approach decreases the overall accuracy by only up to 5% in most of the estimated models. Moreover, the worst class recall value of the RF algorithm without prior oversampling is only 34.4% compared to the corresponding value of 90.3% in the up-sampled model which validates the proposed multi-step preprocessing scheme. This study also identifies the temporal and spatial (roadway) attributes, as well as crash characteristics, and Emergency Medical Service (EMS) as the most critical factors in truck crash severity.
Studies of the risk factors of truck-involved crash severity.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seyed Iman Mohammadpour; Majid Khedmati; Mohammad Javad Hassan Zada (2023). Studies of the risk factors of truck-involved crash severity. [Dataset]. http://doi.org/10.1371/journal.pone.0281901.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0281901.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Seyed Iman Mohammadpour; Majid Khedmati; Mohammad Javad Hassan Zada
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Studies of the risk factors of truck-involved crash severity.
f
Detailed overview of cohort characteristics for train and test cohort.
plos.figshare.com
figshare.com
xls
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hexin Li; Negin Ashrafi; Chris Kang; Guanlan Zhao; Yubing Chen; Maryam Pishgar (2024). Detailed overview of cohort characteristics for train and test cohort. [Dataset]. http://doi.org/10.1371/journal.pone.0309383.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309383.t002
Dataset updated
Sep 4, 2024
Dataset provided by
PLOS ONE
Authors
Hexin Li; Negin Ashrafi; Chris Kang; Guanlan Zhao; Yubing Chen; Maryam Pishgar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Values are presented as means with the standard deviations in parentheses.
f
DataSheet_3_Machine learning-based radiomics for predicting BRAF-V600E...
frontiersin.figshare.com
bin
Updated Aug 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wen Li; Yang Li; Xiaoling Liu; Li Wang; Wenqian Chen; Xueshen Qian; Xianglong Zheng; Jiang Chen; Yiming Liu; Lisong Lin (2023). DataSheet_3_Machine learning-based radiomics for predicting BRAF-V600E mutations in ameloblastoma.docx [Dataset]. http://doi.org/10.3389/fimmu.2023.1180908.s003
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1180908.s003
Dataset updated
Aug 14, 2023
Dataset provided by
Frontiers
Authors
Wen Li; Yang Li; Xiaoling Liu; Li Wang; Wenqian Chen; Xueshen Qian; Xianglong Zheng; Jiang Chen; Yiming Liu; Lisong Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundAmeloblastoma is a locally invasive and aggressive epithelial odontogenic neoplasm. The BRAF-V600E gene mutation is a prevalent genetic alteration found in this tumor and is considered to have a crucial role in its pathogenesis. The objective of this study is to develop and validate a radiomics-based machine learning method for the identification of BRAF-V600E gene mutations in ameloblastoma patients.MethodsIn this retrospective study, data from 103 patients diagnosed with ameloblastoma who underwent BRAF-V600E mutation testing were collected. Of these patients, 72 were included in the training cohort, while 31 were included in the validation cohort. To address class imbalance, synthetic minority over-sampling technique (SMOTE) is applied in our study. Radiomics features were extracted from preprocessed CT images, and the most relevant features, including both radiomics and clinical data, were selected for analysis. Machine learning methods were utilized to construct models. The performance of these models in distinguishing between patients with and without BRAF-V600E gene mutations was evaluated using the receiver operating characteristic (ROC) curve.ResultsWhen the analysis was based on radiomics signature, Random Forest performed better than the others, with the area under the ROC curve (AUC) of 0.87 (95%CI, 0.68-1.00). The performance of XGBoost model is slightly lower than that of Random Forest, and its AUC is 0.83 (95% CI, 0.60-1.00). The nomogram evident that among younger women, the affected region primarily lies within the mandible, and patients with larger tumor diameters exhibit a heightened risk. Additionally, patients with higher radiomics signature scores are more susceptible to the BRAF-V600E gene mutations.ConclusionsOur study presents a comprehensive radiomics-based machine learning model using five different methods to accurately detect BRAF-V600E gene mutations in patients diagnosed with ameloblastoma. The Random Forest model’s high predictive performance, with AUC of 0.87, demonstrates its potential for facilitating a convenient and cost-effective way of identifying patients with the mutation without the need for invasive tumor sampling for molecular testing. This non-invasive approach has the potential to guide preoperative or postoperative drug treatment for affected individuals, thereby improving outcomes.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t009

A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0317396.t009

Dataset updated

Feb 10, 2025

Dataset provided by

PLOS ONE

Authors

Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier.

Clear search

Close search

Google apps

Main menu

A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk...

A comparison of the RN-SMOTE, SMOTE-Tomek Link, SMOTE-ENN, and the proposed...

S5 Dataset -

The selected explanatory variables.

Studies of the risk factors of truck-involved crash severity.

Detailed overview of cohort characteristics for train and test cohort.

DataSheet_3_Machine learning-based radiomics for predicting BRAF-V600E...

A comparison of the CRN-SMOTE and RN-SMOTE methods on the health risk dataset based on different classification metrics using the Random Forest classifier.