100+ datasets found
  1. f

    Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted...

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted rare classes. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted rare classes.

  2. f

    S1 File -

    • plos.figshare.com
    txt
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). S1 File - [Dataset]. http://doi.org/10.1371/journal.pone.0290581.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Covid-19 pandemic has led to an increase in the awareness of and demand for telemedicine services, resulting in a need for automating the process and relying on machine learning (ML) to reduce the operational load. This research proposes a specialty detection classifier based on a machine learning model to automate the process of detecting the correct specialty for each question and routing it to the correct doctor. The study focuses on handling multiclass and highly imbalanced datasets for Arabic medical questions, comparing some oversampling techniques, developing a Deep Neural Network (DNN) model for specialty detection, and exploring the hidden business areas that rely on specialty detection such as customizing and personalizing the consultation flow for different specialties. The proposed module is deployed in both synchronous and asynchronous medical consultations to provide more real-time classification, minimize the doctor effort in addressing the correct specialty, and give the system more flexibility in customizing the medical consultation flow. The evaluation and assessment are based on accuracy, precision, recall, and F1-score. The experimental results suggest that combining multiple techniques, such as SMOTE and reweighing with keyword identification, is necessary to achieve improved performance in detecting rare classes in imbalanced multiclass datasets. By using these techniques, specialty detection models can more accurately detect rare classes in real-world scenarios where imbalanced data is common.

  3. f

    Keyword identification with different factors.

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). Keyword identification with different factors. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Covid-19 pandemic has led to an increase in the awareness of and demand for telemedicine services, resulting in a need for automating the process and relying on machine learning (ML) to reduce the operational load. This research proposes a specialty detection classifier based on a machine learning model to automate the process of detecting the correct specialty for each question and routing it to the correct doctor. The study focuses on handling multiclass and highly imbalanced datasets for Arabic medical questions, comparing some oversampling techniques, developing a Deep Neural Network (DNN) model for specialty detection, and exploring the hidden business areas that rely on specialty detection such as customizing and personalizing the consultation flow for different specialties. The proposed module is deployed in both synchronous and asynchronous medical consultations to provide more real-time classification, minimize the doctor effort in addressing the correct specialty, and give the system more flexibility in customizing the medical consultation flow. The evaluation and assessment are based on accuracy, precision, recall, and F1-score. The experimental results suggest that combining multiple techniques, such as SMOTE and reweighing with keyword identification, is necessary to achieve improved performance in detecting rare classes in imbalanced multiclass datasets. By using these techniques, specialty detection models can more accurately detect rare classes in real-world scenarios where imbalanced data is common.

  4. f

    BILSTM using SMOTE and ADASYN oversampling techniques.

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). BILSTM using SMOTE and ADASYN oversampling techniques. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BILSTM using SMOTE and ADASYN oversampling techniques.

  5. f

    DataSheet2_An Improved Deep Learning Model: S-TextBLCNN for Traditional...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ning Cheng; Yue Chen; Wanqing Gao; Jiajun Liu; Qunfu Huang; Cheng Yan; Xindi Huang; Changsong Ding (2023). DataSheet2_An Improved Deep Learning Model: S-TextBLCNN for Traditional Chinese Medicine Formula Classification.xlsx [Dataset]. http://doi.org/10.3389/fgene.2021.807825.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Ning Cheng; Yue Chen; Wanqing Gao; Jiajun Liu; Qunfu Huang; Cheng Yan; Xindi Huang; Changsong Ding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F1-score of 0.762, both higher than the logistic regression (acc = 0.561, F1-score = 0.567), SVM (acc = 0.703, F1-score = 0.591), LSTM (acc = 0.723, F1-score = 0.621), and TextCNN (acc = 0.745, F1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.

  6. f

    Group and number of experiments.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Group and number of experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  7. Demographic information of ADNI fMRI cohort.

    • plos.figshare.com
    xls
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiao Zhou; Sanchita Kedia; Ran Meng; Mark Gerstein (2024). Demographic information of ADNI fMRI cohort. [Dataset]. http://doi.org/10.1371/journal.pone.0312848.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiao Zhou; Sanchita Kedia; Ran Meng; Mark Gerstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The early detection of Alzheimer’s Disease (AD) is thought to be important for effective intervention and management. Here, we explore deep learning methods for the early detection of AD. We consider both genetic risk factors and functional magnetic resonance imaging (fMRI) data. However, we found that the genetic factors do not notably enhance the AD prediction by imaging. Thus, we focus on building an effective imaging-only model. In particular, we utilize data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), employing a 3D Convolutional Neural Network (CNN) to analyze fMRI scans. Despite the limitations posed by our dataset (small size and imbalanced nature), our CNN model demonstrates accuracy levels reaching 92.8% and an ROC of 0.95. Our research highlights the complexities inherent in integrating multimodal medical datasets. It also demonstrates the potential of deep learning in medical imaging for AD prediction.

  8. f

    Classification result classifiers using TF-IDF with SMOTE.

    • plos.figshare.com
    xls
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaled Alnowaiser (2024). Classification result classifiers using TF-IDF with SMOTE. [Dataset]. http://doi.org/10.1371/journal.pone.0302304.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Khaled Alnowaiser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification result classifiers using TF-IDF with SMOTE.

  9. f

    Feature correspondence q.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Feature correspondence q. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  10. f

    Effect of using attention and not using.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Effect of using attention and not using. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  11. f

    The composition of features.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). The composition of features. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  12. Imbalanced PAS positive-negative ratio.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yumin Zheng; Haohan Wang; Yang Zhang; Xin Gao; Eric P. Xing; Min Xu (2023). Imbalanced PAS positive-negative ratio. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008297.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yumin Zheng; Haohan Wang; Yang Zhang; Xin Gao; Eric P. Xing; Min Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Imbalanced PAS positive-negative ratio.

  13. f

    Model performance compared with other methods.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang (2023). Model performance compared with other methods. [Dataset]. http://doi.org/10.1371/journal.pone.0276835.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dengao Li; Jian Fu; Jumin Zhao; Junnan Qin; Lihui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.

  14. f

    Build and experiment BILSTM models with the addition of eeightings on the...

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). Build and experiment BILSTM models with the addition of eeightings on the classes. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Build and experiment BILSTM models with the addition of eeightings on the classes.

  15. f

    Classification results of machine learning models using BoW on imbalanced...

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eysha Saad; Saima Sadiq; Ramish Jamil; Furqan Rustam; Arif Mehmood; Gyu Sang Choi; Imran Ashraf (2023). Classification results of machine learning models using BoW on imbalanced dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270327.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eysha Saad; Saima Sadiq; Ramish Jamil; Furqan Rustam; Arif Mehmood; Gyu Sang Choi; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification results of machine learning models using BoW on imbalanced dataset.

  16. f

    Sample of specialty keyword identification for rare specialties.

    • figshare.com
    xls
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). Sample of specialty keyword identification for rare specialties. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alaa Alomari; Hossam Faris; Pedro A. Castillo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample of specialty keyword identification for rare specialties.

  17. f

    Performances for machine learning models using leave-one-out cross...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Teoh; Achintha Avin Ihalage; Srooley Harp; Zahra F. Al-Khateeb; Adina T. Michael-Titus; Jordi L. Tremoleda; Yang Hao (2023). Performances for machine learning models using leave-one-out cross validation. [Dataset]. http://doi.org/10.1371/journal.pone.0268962.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Lucas Teoh; Achintha Avin Ihalage; Srooley Harp; Zahra F. Al-Khateeb; Adina T. Michael-Titus; Jordi L. Tremoleda; Yang Hao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performances for machine learning models using leave-one-out cross validation.

  18. f

    Accuracy of multiple classifiers with a deep-learning-based feature set

    • plos.figshare.com
    xls
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afira Aslam; Syed Muhammad Usman; Muhammad Zubair; Amanullah Yasin; Muhammad Owais; Irfan Hussain (2025). Accuracy of multiple classifiers with a deep-learning-based feature set [Dataset]. http://doi.org/10.1371/journal.pone.0324293.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Afira Aslam; Syed Muhammad Usman; Muhammad Zubair; Amanullah Yasin; Muhammad Owais; Irfan Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accuracy of multiple classifiers with a deep-learning-based feature set

  19. f

    Classification results of machine learning models using TF-IDF on imbalanced...

    • figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eysha Saad; Saima Sadiq; Ramish Jamil; Furqan Rustam; Arif Mehmood; Gyu Sang Choi; Imran Ashraf (2023). Classification results of machine learning models using TF-IDF on imbalanced dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270327.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eysha Saad; Saima Sadiq; Ramish Jamil; Furqan Rustam; Arif Mehmood; Gyu Sang Choi; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification results of machine learning models using TF-IDF on imbalanced dataset.

  20. f

    Table_4_Improved Point-Cloud Segmentation for Plant Phenotyping Through...

    • frontiersin.figshare.com
    docx
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frans P. Boogaard; Eldert J. van Henten; Gert Kootstra (2023). Table_4_Improved Point-Cloud Segmentation for Plant Phenotyping Through Class-Dependent Sampling of Training Data to Battle Class Imbalance.docx [Dataset]. http://doi.org/10.3389/fpls.2022.838190.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Frans P. Boogaard; Eldert J. van Henten; Gert Kootstra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Plant scientists and breeders require high-quality phenotypic data. However, obtaining accurate manual measurements for large plant populations is often infeasible, due to the high labour requirement involved. This is especially the case for more complex plant traits, like the traits defining the plant architecture. Computer-vision methods can help in solving this bottleneck. The current work focusses on methods using 3D point cloud data to obtain phenotypic datasets of traits related to the plant architecture. A first step is the segmentation of the point clouds into plant organs. One of the issues in point-cloud segmentation is that not all plant parts are equally represented in the data and that the segmentation performance is typically lower for minority classes than for majority classes. To address this class-imbalance problem, we used a common practice to divide large point clouds into chunks that were independently segmented and recombined later. In our case, the chunks were created by selecting anchor points and combining those with points in their neighbourhood. As a baseline, the anchor points were selected in a class-independent way, representing the class distribution in the original data. Then, we propose a class-dependent sampling strategy to battle class imbalance. The difference in segmentation performance between the class-independent and the class-dependent training set was analysed first. Additionally, the effect of the number of points selected as the neighbourhood was investigated. Smaller neighbourhoods resulted in a higher level of class balance, but also in a loss of context that was contained in the points around the anchor point. The overall segmentation quality, measured as the mean intersection-over-union (IoU), increased from 0.94 to 0.96 when the class-dependent training set was used. The biggest class improvement was found for the “node,” for which the percentage of correctly segmented points increased by 46.0 percentage points. The results of the second experiment clearly showed that higher levels of class balance did not necessarily lead to better segmentation performance. Instead, the optimal neighbourhood size differed per class. In conclusion, it was demonstrated that our class-dependent sampling strategy led to an improved point-cloud segmentation method for plant phenotyping.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alaa Alomari; Hossam Faris; Pedro A. Castillo (2023). Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted rare classes. [Dataset]. http://doi.org/10.1371/journal.pone.0290581.t007

Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted rare classes.

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Nov 16, 2023
Dataset provided by
PLOS ONE
Authors
Alaa Alomari; Hossam Faris; Pedro A. Castillo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Summary table: Oversampling techniques using SMOTE, ADASYN, and weighted rare classes.

Search
Clear search
Close search
Google apps
Main menu