100+ datasets found
  1. f

    The definition of a confusion matrix.

    • plos.figshare.com
    xls
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). The definition of a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE’s neighbors’ number to 5.

  2. smote_smote

    • kaggle.com
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Cagnazzo (2024). smote_smote [Dataset]. https://www.kaggle.com/datasets/davidecag/smote-smote/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Davide Cagnazzo
    Description

    Dataset

    This dataset was created by Davide Cagnazzo

    Contents

  3. f

    Classification result classifiers using TF with SMOTE.

    • plos.figshare.com
    xls
    Updated May 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaled Alnowaiser (2024). Classification result classifiers using TF with SMOTE. [Dataset]. http://doi.org/10.1371/journal.pone.0302304.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Khaled Alnowaiser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification result classifiers using TF with SMOTE.

  4. f

    Data_Sheet_1_Effect of De-noising by Wavelet Filtering and Data Augmentation...

    • frontiersin.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min Jin; Chunguang Wang; Dan Børge Jensen (2023). Data_Sheet_1_Effect of De-noising by Wavelet Filtering and Data Augmentation by Borderline SMOTE on the Classification of Imbalanced Datasets of Pig Behavior.pdf [Dataset]. http://doi.org/10.3389/fanim.2021.666855.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Min Jin; Chunguang Wang; Dan Børge Jensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification of imbalanced datasets of animal behavior has been one of the top challenges in the field of animal science. An imbalanced dataset will lead many classification algorithms to being less effective and result in a higher misclassification rate for the minority classes. The aim of this study was to assess a method for addressing the problem of imbalanced datasets of pigs' behavior by using an over-sampling method, namely Borderline-SMOTE. The pigs' activity was measured using a triaxial accelerometer, which was mounted on the back of the pigs. Wavelet filtering and Borderline-SMOTE were both applied as methods to pre-process the dataset. A multilayer feed-forward neural network was trained and validated with 21 input features to classify four pig activities: lying, standing, walking, and exploring. The results showed that wavelet filtering and Borderline-SMOTE both lead to improved performance. Furthermore, Borderline-SMOTE yielded greater improvements in classification performance than an alternative method for balancing the training data, namely random under-sampling, which is commonly used in animal science research. However, the overall performance was not adequate to satisfy the research needs in this field and to address the common but urgent problem of imbalanced behavior dataset.

  5. t

    Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, W Philip Kegelmeyer...

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, W Philip Kegelmeyer (2024). Dataset: SMOTE: Synthetic Minority Over-Sampling Technique. https://doi.org/10.57702/tq0zp0i3 [Dataset]. https://service.tib.eu/ldmservice/dataset/smote--synthetic-minority-over-sampling-technique
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    SMOTE: synthetic minority over-sampling technique.

  6. s

    Data from: High impact bug report identification with imbalanced learning...

    • researchdata.smu.edu.sg
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YANG Xinli; David LO; Xin XIA; Qiao HUANG; Jianling SUN (2023). Data from: High impact bug report identification with imbalanced learning strategies [Dataset]. http://doi.org/10.25440/smu.12062763.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    YANG Xinli; David LO; Xin XIA; Qiao HUANG; Jianling SUN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains the underlying research data for the publication "High impact bug report identification with imbalanced learning strategies" and the full-text is available from: https://ink.library.smu.edu.sg/sis_research/3702In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.Supplementary code and data available from GitHub:

  7. Indian Liver Patient Dataset (ILPD)

    • kaggle.com
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Mohandas N (2021). Indian Liver Patient Dataset (ILPD) [Dataset]. https://www.kaggle.com/datasets/saumyamohandas/indian-liver-patient-dataset-ilpd/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saumya Mohandas N
    Description

    Dataset

    This dataset was created by Saumya Mohandas N

    Contents

  8. Amazon Kaggle SMOTE with PCA

    • kaggle.com
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LennyTheDefiant (2024). Amazon Kaggle SMOTE with PCA [Dataset]. https://www.kaggle.com/datasets/lennythedefiant/amazon-kaggle-smote-with-pca/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    LennyTheDefiant
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by LennyTheDefiant

    Released under MIT

    Contents

  9. f

    Supplementary tables. A hybrid resampling algorithms SMOTE and ENN based...

    • tandf.figshare.com
    docx
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madhulata Kumari; Naidu Subbarao (2024). Supplementary tables. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors [Dataset]. http://doi.org/10.25402/FMC.19550878.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Madhulata Kumari; Naidu Subbarao
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Supplementary Table 1: The lead molecules of anti-MARV from ChemDiv antiviral library Supplementary Table 2: The lead molecules of anti-MARV from ChEMBL antiviral library. Supplementary Table 3: The lead molecules of anti-MARV from phytochemical database. Supplementary Table 4: The lead molecules of anti-MARV from natural product NCI diversity setIV.

  10. m

    Synthetic oversampling for credit card default prediction

    • data.mendeley.com
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fransiscus Pratikto (2023). Synthetic oversampling for credit card default prediction [Dataset]. http://doi.org/10.17632/jrss9jdjz9.1
    Explore at:
    Dataset updated
    Mar 8, 2023
    Authors
    Fransiscus Pratikto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains more than 17000 data of credit card holder with 20 predictor variables and 1 binary target variable. The corresponding R code for comparing several proposed (density-based) and existing synthetic oversampling methods (SMOTE-based) is also provided.

  11. i

    Korean Voice Phishing Detection Dataset with Multilingual Back-Translation...

    • ieee-dataport.org
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MILANDU KEITH MOUSSAVOU BOUSSOUGOU (2024). Korean Voice Phishing Detection Dataset with Multilingual Back-Translation and SMOTE Augmentations [Dataset]. https://ieee-dataport.org/documents/korean-voice-phishing-detection-dataset-multilingual-back-translation-and-smote
    Explore at:
    Dataset updated
    Nov 11, 2024
    Authors
    MILANDU KEITH MOUSSAVOU BOUSSOUGOU
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Chinese

  12. Mushroom Dataset SMOTE

    • kaggle.com
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avir_Sultana (2024). Mushroom Dataset SMOTE [Dataset]. https://www.kaggle.com/avirsultana/mushroom-dataset-smote/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avir_Sultana
    Description

    Dataset

    This dataset was created by Avir_Sultana

    Contents

  13. Credit Card Fraud Detection using SMOTE

    • kaggle.com
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samad Khan (2024). Credit Card Fraud Detection using SMOTE [Dataset]. https://www.kaggle.com/datasets/samadkhan0017/credit-card-fraud-detection-using-smote/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samad Khan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Credit Card Fraud Detection Project Objective: To develop a robust model for detecting fraudulent transactions using a dataset from Kaggle.

    Data Preprocessing: The dataset was highly imbalanced, with significantly more legitimate transactions than fraudulent ones. To address this, I employed the SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples of the minority class, improving the model's ability to learn from fraudulent instances.

    Modeling: I utilized the Random Forest algorithm for classification. Its ensemble approach helps improve accuracy and reduce overfitting, making it well-suited for this task. Key steps included:

    1) Model Training: Fitting the Random Forest model on the balanced dataset. 2) Evaluation: Assessing model performance using metrics such as accuracy, precision, recall, and the F1 score.

    Results: The Random Forest model demonstrated strong predictive capabilities, effectively identifying fraudulent transactions while minimizing false positives. The use of SMOTE significantly enhanced the model’s performance by providing a more balanced view of the classes.

    Conclusion: This project highlights the importance of addressing class imbalance in fraud detection and showcases the effectiveness of combining SMOTE with Random Forest for improved accuracy in financial transaction analysis.

  14. Smote_CHUHOA

    • kaggle.com
    Updated Feb 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thanh B1909984 (2023). Smote_CHUHOA [Dataset]. https://www.kaggle.com/datasets/thanhb1909984/smote-chuhoa/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Thanh B1909984
    Description

    Dataset

    This dataset was created by Thanh B1909984

    Contents

  15. s

    Data from: Dataset for classification of signaling proteins based on...

    • portalcientifico.sergas.es
    Updated 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernandez-Lozano, Carlos; Munteanu, Cristian Robert; Fernandez-Lozano, Carlos; Munteanu, Cristian Robert (2015). Dataset for classification of signaling proteins based on molecular star graph descriptors using machine-learning models [Dataset]. https://portalcientifico.sergas.es/documentos/668fc447b9e7c03b01bd8975
    Explore at:
    Dataset updated
    2015
    Authors
    Fernandez-Lozano, Carlos; Munteanu, Cristian Robert; Fernandez-Lozano, Carlos; Munteanu, Cristian Robert
    Description

    The positive group of 608 signaling protein sequences was downloaded as FASTA format from Protein Databank (Berman et al., 2000) by using the “Molecular Function Browser” in the “Advanced Search Interface” (“Signaling (GO ID23052)”, protein identity cut-off = 30%). The negative group of 2077 non-signaling proteins was downloaded as the PISCES CulledPDB (http://dunbrack.fccc.edu/PISCES.php) (Wang & R. L. Dunbrack, 2003) (November 19th, 2012) using identity (degree of correspondence between two sequences) less than 20%, resolution of 1.6 Å and R-factor 0.25. The full dataset is containing 2685 FASTA sequences of protein chains from the PDB databank: 608 are signaling proteins and 2077 are non-signaling peptides. This kind of unbalanced data is not the most suitable to be used as an input for learning algorithms because the results would present a high sensitivity and low specificity; learning algorithms would tend to classify most of samples as part of the most common group. To avoid this situation, a pre-processing stage is needed in order to get a more balanced dataset, in this case by means of the synthetic minority oversampling technique (SMOTE). In short, SMOTE provides a more balanced dataset using an expansion of the lower class by creating new samples, interpolating other minority-class samples. After this pre-processing, the final dataset is composed of 1824 positive samples (signaling protein chains) and 2432 negative cases (non-signaling protein chains). Paper is available at: http://dx.doi.org/10.1016/j.jtbi.2015.07.038 Please cite: Carlos Fernandez-Lozano, Rubén F. Cuiñas, José A. Seoane, Enrique Fernández-Blanco, Julian Dorado, Cristian R. Munteanu, Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models, Journal of Theoretical Biology, Volume 384, 7 November 2015, Pages 50-58, ISSN 0022-5193, http://dx.doi.org/10.1016/j.jtbi.2015.07.038.(http://www.sciencedirect.com/science/article/pii/S0022519315003999)

  16. w

    Dataset of book subjects where books equals 'His Captain's hand on his...

    • workwithdata.com
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects where books equals 'His Captain's hand on his shoulder smote' : The incidence and influence of cricket in schoolboy stories [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=%27His+Captain%27s+hand+on+his+shoulder+smote%27+:+The+incidence+and+influence+of+cricket+in+schoolboy+stories&j=1&j0=books
    Explore at:
    Dataset updated
    Aug 8, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is 'His Captain's hand on his shoulder smote' : The incidence and influence of cricket in schoolboy stories. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date.

  17. TPS - Mar 2021 - Ordinal+SMOTE

    • kaggle.com
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Novello (2021). TPS - Mar 2021 - Ordinal+SMOTE [Dataset]. https://www.kaggle.com/rafanovello/tps-ordinal-smote-pkl/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rafael Novello
    Description

    Dataset

    This dataset was created by Rafael Novello

    Contents

  18. DermaEvolve - Skin Disease Pred. - SMOTE Balanced

    • kaggle.com
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lokesh Bhaskar (2025). DermaEvolve - Skin Disease Pred. - SMOTE Balanced [Dataset]. http://doi.org/10.34740/kaggle/dsv/9786370
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lokesh Bhaskar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DermaEvolve Dataset

    Overview

    The DermaEvolve dataset is a comprehensive collection of skin lesion images, sourced from publicly available datasets and extended with additional rare diseases. This dataset aims to aid in the development and evaluation of machine learning models for dermatological diagnosis.

    Sources

    The dataset is primarily derived from: - HAM10000 (Kaggle link) – A collection of dermatoscopic images with various skin lesion types. - ISIC Archive (Kaggle link) – A dataset of skin cancer images categorized into multiple classes. - Dermnet NZ – Used to source additional rare diseases for dataset extension. https://dermnetnz.org/ - Google Database - Images

    Categories

    The dataset includes images of the following skin conditions:

    Common Categories:

    • Basal Cell Carcinoma
    • Squamous Cell Carcinoma
    • Melanoma
    • Actinic Keratosis
    • Pigmented Benign Keratosis
    • Seborrheic Keratosis
    • Vascular Lesion
    • Melanocytic Nevus
    • Dermatofibroma

    Rare Diseases (Extended):

    To enhance diversity, the following rare skin conditions were added from Dermnet NZ: - Elastosis Perforans Serpiginosa - Lentigo Maligna - Nevus Sebaceus - Blue Naevus

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15829785%2Fa8d519f4192efe1575c428ab269a6dc9%2Fsmote.png?generation=1741698292237699&alt=media" alt="smote">

    Dataset Characteristics

    • Class Imbalance Handles: The dataset consists of uniform class distribution.
    • Image Size: 64 x 64 for memory standards in kaggle.

    The resizing and augmentation are made on dataset from my previously uploaded raw dataset : https://www.kaggle.com/datasets/lokeshbhaskarnr/dermaevolve-original-unprocessed/data

    Acknowledgements

    Special thanks to the authors of the original datasets: - HAM10000 – Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. - ISIC Archive – International Skin Imaging Collaboration (ISIC), a repository for dermatology imaging. - Dermnet NZ – A valuable resource for dermatological images.

    Usage

    This dataset can be used for: - Training deep learning models for skin lesion classification. - Research on dermatological image analysis. - Development of computer-aided diagnostic tools.

    Please cite the original datasets if you use this resource in your work.

    NOTE :

    Check out the github repository for the streamlit application that focuses on skin disease prediction --> https://github.com/LokeshBhaskarNR/DermaEvolve---An-Advanced-Skin-Disease-Predictor.git

    Streamlit Application Link : https://dermaevolve.streamlit.app/

    Kindly check out my notebooks for the processed models and code -->

    Check out my NoteBooks on multiple models trained on this dataset :

  19. NSL KDD Smote

    • kaggle.com
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthik Ragavender.B (2025). NSL KDD Smote [Dataset]. https://www.kaggle.com/datasets/karthikragavenderb/nsl-kdd-smote/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Karthik Ragavender.B
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Karthik Ragavender.B

    Released under Apache 2.0

    Contents

  20. Results of Bioassay 1608 dataset in experiment 2.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinyan Li; Lian-sheng Liu; Simon Fong; Raymond K. Wong; Sabah Mohammed; Jinan Fiaidhi; Yunsick Sung; Kelvin K. L. Wong (2023). Results of Bioassay 1608 dataset in experiment 2. [Dataset]. http://doi.org/10.1371/journal.pone.0180830.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jinyan Li; Lian-sheng Liu; Simon Fong; Raymond K. Wong; Sabah Mohammed; Jinan Fiaidhi; Yunsick Sung; Kelvin K. L. Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of Bioassay 1608 dataset in experiment 2.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). The definition of a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t002

The definition of a confusion matrix.

Related Article
Explore at:
32 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Feb 10, 2025
Dataset provided by
PLOS ONE
Authors
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE’s neighbors’ number to 5.

Search
Clear search
Close search
Google apps
Main menu