100+ datasets found

f
The definition of a confusion matrix.
plos.figshare.com
xls
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). The definition of a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317396.t002
Dataset updated
Feb 10, 2025
Dataset provided by
PLOS ONE
Authors
Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE’s neighbors’ number to 5.
smote_smote
kaggle.com
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Cagnazzo (2024). smote_smote [Dataset]. https://www.kaggle.com/datasets/davidecag/smote-smote/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Davide Cagnazzo
Description
Dataset

This dataset was created by Davide Cagnazzo

Contents
f
Classification result classifiers using TF with SMOTE.
plos.figshare.com
xls
Updated May 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khaled Alnowaiser (2024). Classification result classifiers using TF with SMOTE. [Dataset]. http://doi.org/10.1371/journal.pone.0302304.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302304.t005
Dataset updated
May 28, 2024
Dataset provided by
PLOS ONE
Authors
Khaled Alnowaiser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification result classifiers using TF with SMOTE.
f
Data_Sheet_1_Effect of De-noising by Wavelet Filtering and Data Augmentation...
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Min Jin; Chunguang Wang; Dan Børge Jensen (2023). Data_Sheet_1_Effect of De-noising by Wavelet Filtering and Data Augmentation by Borderline SMOTE on the Classification of Imbalanced Datasets of Pig Behavior.pdf [Dataset]. http://doi.org/10.3389/fanim.2021.666855.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fanim.2021.666855.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Min Jin; Chunguang Wang; Dan Børge Jensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification of imbalanced datasets of animal behavior has been one of the top challenges in the field of animal science. An imbalanced dataset will lead many classification algorithms to being less effective and result in a higher misclassification rate for the minority classes. The aim of this study was to assess a method for addressing the problem of imbalanced datasets of pigs' behavior by using an over-sampling method, namely Borderline-SMOTE. The pigs' activity was measured using a triaxial accelerometer, which was mounted on the back of the pigs. Wavelet filtering and Borderline-SMOTE were both applied as methods to pre-process the dataset. A multilayer feed-forward neural network was trained and validated with 21 input features to classify four pig activities: lying, standing, walking, and exploring. The results showed that wavelet filtering and Borderline-SMOTE both lead to improved performance. Furthermore, Borderline-SMOTE yielded greater improvements in classification performance than an alternative method for balancing the training data, namely random under-sampling, which is commonly used in animal science research. However, the overall performance was not adequate to satisfy the research needs in this field and to address the common but urgent problem of imbalanced behavior dataset.
t
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, W Philip Kegelmeyer...
service.tib.eu
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, W Philip Kegelmeyer (2024). Dataset: SMOTE: Synthetic Minority Over-Sampling Technique. https://doi.org/10.57702/tq0zp0i3 [Dataset]. https://service.tib.eu/ldmservice/dataset/smote--synthetic-minority-over-sampling-technique
Explore at:
Dataset updated
Dec 3, 2024
Description
SMOTE: synthetic minority over-sampling technique.
s
Data from: High impact bug report identification with imbalanced learning...
researchdata.smu.edu.sg
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YANG Xinli; David LO; Xin XIA; Qiao HUANG; Jianling SUN (2023). Data from: High impact bug report identification with imbalanced learning strategies [Dataset]. http://doi.org/10.25440/smu.12062763.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.12062763.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
YANG Xinli; David LO; Xin XIA; Qiao HUANG; Jianling SUN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains the underlying research data for the publication "High impact bug report identification with imbalanced learning strategies" and the full-text is available from: https://ink.library.smu.edu.sg/sis_research/3702In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.Supplementary code and data available from GitHub:
Indian Liver Patient Dataset (ILPD)
kaggle.com
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Mohandas N (2021). Indian Liver Patient Dataset (ILPD) [Dataset]. https://www.kaggle.com/datasets/saumyamohandas/indian-liver-patient-dataset-ilpd/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saumya Mohandas N
Description
Dataset

This dataset was created by Saumya Mohandas N

Contents
Amazon Kaggle SMOTE with PCA
kaggle.com
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LennyTheDefiant (2024). Amazon Kaggle SMOTE with PCA [Dataset]. https://www.kaggle.com/datasets/lennythedefiant/amazon-kaggle-smote-with-pca/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 22, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
LennyTheDefiant
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by LennyTheDefiant

Released under MIT

Contents
f
Supplementary tables. A hybrid resampling algorithms SMOTE and ENN based...
tandf.figshare.com
docx
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madhulata Kumari; Naidu Subbarao (2024). Supplementary tables. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors [Dataset]. http://doi.org/10.25402/FMC.19550878.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25402/FMC.19550878.v1
Dataset updated
May 16, 2024
Dataset provided by
Taylor & Francis
Authors
Madhulata Kumari; Naidu Subbarao
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Supplementary Table 1: The lead molecules of anti-MARV from ChemDiv antiviral library Supplementary Table 2: The lead molecules of anti-MARV from ChEMBL antiviral library. Supplementary Table 3: The lead molecules of anti-MARV from phytochemical database. Supplementary Table 4: The lead molecules of anti-MARV from natural product NCI diversity setIV.
m
Synthetic oversampling for credit card default prediction
data.mendeley.com
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fransiscus Pratikto (2023). Synthetic oversampling for credit card default prediction [Dataset]. http://doi.org/10.17632/jrss9jdjz9.1
Explore at:
Unique identifier
https://doi.org/10.17632/jrss9jdjz9.1
Dataset updated
Mar 8, 2023
Authors
Fransiscus Pratikto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains more than 17000 data of credit card holder with 20 predictor variables and 1 binary target variable. The corresponding R code for comparing several proposed (density-based) and existing synthetic oversampling methods (SMOTE-based) is also provided.
i
Korean Voice Phishing Detection Dataset with Multilingual Back-Translation...
ieee-dataport.org
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MILANDU KEITH MOUSSAVOU BOUSSOUGOU (2024). Korean Voice Phishing Detection Dataset with Multilingual Back-Translation and SMOTE Augmentations [Dataset]. https://ieee-dataport.org/documents/korean-voice-phishing-detection-dataset-multilingual-back-translation-and-smote
Explore at:
Dataset updated
Nov 11, 2024
Authors
MILANDU KEITH MOUSSAVOU BOUSSOUGOU
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Chinese
Mushroom Dataset SMOTE
kaggle.com
Updated Nov 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avir_Sultana (2024). Mushroom Dataset SMOTE [Dataset]. https://www.kaggle.com/avirsultana/mushroom-dataset-smote/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Avir_Sultana
Description
Dataset

This dataset was created by Avir_Sultana

Contents
Credit Card Fraud Detection using SMOTE
kaggle.com
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samad Khan (2024). Credit Card Fraud Detection using SMOTE [Dataset]. https://www.kaggle.com/datasets/samadkhan0017/credit-card-fraud-detection-using-smote/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samad Khan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Credit Card Fraud Detection Project Objective: To develop a robust model for detecting fraudulent transactions using a dataset from Kaggle.

Data Preprocessing: The dataset was highly imbalanced, with significantly more legitimate transactions than fraudulent ones. To address this, I employed the SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples of the minority class, improving the model's ability to learn from fraudulent instances.

Modeling: I utilized the Random Forest algorithm for classification. Its ensemble approach helps improve accuracy and reduce overfitting, making it well-suited for this task. Key steps included:

1) Model Training: Fitting the Random Forest model on the balanced dataset. 2) Evaluation: Assessing model performance using metrics such as accuracy, precision, recall, and the F1 score.

Results: The Random Forest model demonstrated strong predictive capabilities, effectively identifying fraudulent transactions while minimizing false positives. The use of SMOTE significantly enhanced the model’s performance by providing a more balanced view of the classes.

Conclusion: This project highlights the importance of addressing class imbalance in fraud detection and showcases the effectiveness of combining SMOTE with Random Forest for improved accuracy in financial transaction analysis.
Smote_CHUHOA
kaggle.com
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thanh B1909984 (2023). Smote_CHUHOA [Dataset]. https://www.kaggle.com/datasets/thanhb1909984/smote-chuhoa/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Thanh B1909984
Description
Dataset

This dataset was created by Thanh B1909984

Contents
s
Data from: Dataset for classification of signaling proteins based on...
portalcientifico.sergas.es
Updated 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernandez-Lozano, Carlos; Munteanu, Cristian Robert; Fernandez-Lozano, Carlos; Munteanu, Cristian Robert (2015). Dataset for classification of signaling proteins based on molecular star graph descriptors using machine-learning models [Dataset]. https://portalcientifico.sergas.es/documentos/668fc447b9e7c03b01bd8975
Explore at:
Dataset updated
2015
Authors
Fernandez-Lozano, Carlos; Munteanu, Cristian Robert; Fernandez-Lozano, Carlos; Munteanu, Cristian Robert
Description
The positive group of 608 signaling protein sequences was downloaded as FASTA format from Protein Databank (Berman et al., 2000) by using the “Molecular Function Browser” in the “Advanced Search Interface” (“Signaling (GO ID23052)”, protein identity cut-off = 30%). The negative group of 2077 non-signaling proteins was downloaded as the PISCES CulledPDB (http://dunbrack.fccc.edu/PISCES.php) (Wang & R. L. Dunbrack, 2003) (November 19th, 2012) using identity (degree of correspondence between two sequences) less than 20%, resolution of 1.6 Å and R-factor 0.25. The full dataset is containing 2685 FASTA sequences of protein chains from the PDB databank: 608 are signaling proteins and 2077 are non-signaling peptides. This kind of unbalanced data is not the most suitable to be used as an input for learning algorithms because the results would present a high sensitivity and low specificity; learning algorithms would tend to classify most of samples as part of the most common group. To avoid this situation, a pre-processing stage is needed in order to get a more balanced dataset, in this case by means of the synthetic minority oversampling technique (SMOTE). In short, SMOTE provides a more balanced dataset using an expansion of the lower class by creating new samples, interpolating other minority-class samples. After this pre-processing, the final dataset is composed of 1824 positive samples (signaling protein chains) and 2432 negative cases (non-signaling protein chains). Paper is available at: http://dx.doi.org/10.1016/j.jtbi.2015.07.038 Please cite: Carlos Fernandez-Lozano, Rubén F. Cuiñas, José A. Seoane, Enrique Fernández-Blanco, Julian Dorado, Cristian R. Munteanu, Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models, Journal of Theoretical Biology, Volume 384, 7 November 2015, Pages 50-58, ISSN 0022-5193, http://dx.doi.org/10.1016/j.jtbi.2015.07.038.(http://www.sciencedirect.com/science/article/pii/S0022519315003999)
w
Dataset of book subjects where books equals 'His Captain's hand on his...
workwithdata.com
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects where books equals 'His Captain's hand on his shoulder smote' : The incidence and influence of cricket in schoolboy stories [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=%27His+Captain%27s+hand+on+his+shoulder+smote%27+:+The+incidence+and+influence+of+cricket+in+schoolboy+stories&j=1&j0=books
Explore at:
Dataset updated
Aug 8, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is 'His Captain's hand on his shoulder smote' : The incidence and influence of cricket in schoolboy stories. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date.
TPS - Mar 2021 - Ordinal+SMOTE
kaggle.com
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Novello (2021). TPS - Mar 2021 - Ordinal+SMOTE [Dataset]. https://www.kaggle.com/rafanovello/tps-ordinal-smote-pkl/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rafael Novello
Description
Dataset

This dataset was created by Rafael Novello

Contents
DermaEvolve - Skin Disease Pred. - SMOTE Balanced
kaggle.com
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lokesh Bhaskar (2025). DermaEvolve - Skin Disease Pred. - SMOTE Balanced [Dataset]. http://doi.org/10.34740/kaggle/dsv/9786370
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9786370
Dataset updated
Mar 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lokesh Bhaskar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
DermaEvolve Dataset

Overview

The DermaEvolve dataset is a comprehensive collection of skin lesion images, sourced from publicly available datasets and extended with additional rare diseases. This dataset aims to aid in the development and evaluation of machine learning models for dermatological diagnosis.

Sources

The dataset is primarily derived from: - HAM10000 (Kaggle link) – A collection of dermatoscopic images with various skin lesion types. - ISIC Archive (Kaggle link) – A dataset of skin cancer images categorized into multiple classes. - Dermnet NZ – Used to source additional rare diseases for dataset extension. https://dermnetnz.org/ - Google Database - Images

Categories

The dataset includes images of the following skin conditions:

Common Categories:

Basal Cell Carcinoma

Squamous Cell Carcinoma

Melanoma

Actinic Keratosis

Pigmented Benign Keratosis

Seborrheic Keratosis

Vascular Lesion

Melanocytic Nevus

Dermatofibroma

Rare Diseases (Extended):

To enhance diversity, the following rare skin conditions were added from Dermnet NZ: - Elastosis Perforans Serpiginosa - Lentigo Maligna - Nevus Sebaceus - Blue Naevus

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15829785%2Fa8d519f4192efe1575c428ab269a6dc9%2Fsmote.png?generation=1741698292237699&alt=media" alt="smote">

Dataset Characteristics

Class Imbalance Handles: The dataset consists of uniform class distribution.

Image Size: 64 x 64 for memory standards in kaggle.

The resizing and augmentation are made on dataset from my previously uploaded raw dataset : https://www.kaggle.com/datasets/lokeshbhaskarnr/dermaevolve-original-unprocessed/data

Acknowledgements

Special thanks to the authors of the original datasets: - HAM10000 – Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. - ISIC Archive – International Skin Imaging Collaboration (ISIC), a repository for dermatology imaging. - Dermnet NZ – A valuable resource for dermatological images.

Usage

This dataset can be used for: - Training deep learning models for skin lesion classification. - Research on dermatological image analysis. - Development of computer-aided diagnostic tools.

Please cite the original datasets if you use this resource in your work.

NOTE :

Check out the github repository for the streamlit application that focuses on skin disease prediction --> https://github.com/LokeshBhaskarNR/DermaEvolve---An-Advanced-Skin-Disease-Predictor.git

Streamlit Application Link : https://dermaevolve.streamlit.app/

Kindly check out my notebooks for the processed models and code -->

SMOTE oversampled data link : https://www.kaggle.com/datasets/lokeshbhaskarnr/dermaevolve-augmented-unprocessed-64/data

CODE - SMOTE oversampling Notebook : https://www.kaggle.com/code/lokeshbhaskarnr/smote-oversampling

Original Dataset : https://www.kaggle.com/datasets/lokeshbhaskarnr/dermaevolve-original-unprocessed

Check out my NoteBooks on multiple models trained on this dataset :

NASNet model : http://kaggle.com/code/lokeshbhaskarnr/skin-disease-pred-dermaevolve-nasnet-mobile

MobileNet model : https://www.kaggle.com/code/lokeshbhaskarnr/skin-disease-pred-dermaevolve-mobilenet

Custom CNN model : https://www.kaggle.com/code/lokeshbhaskarnr/dermaevolve-custom-cnn-model-train

DenseNet 169 model : https://www.kaggle.com/code/lokeshbhaskarnr/skin-disease-pred-dermaevolve-densenet-169
NSL KDD Smote
kaggle.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthik Ragavender.B (2025). NSL KDD Smote [Dataset]. https://www.kaggle.com/datasets/karthikragavenderb/nsl-kdd-smote/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Karthik Ragavender.B
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Karthik Ragavender.B

Released under Apache 2.0

Contents
Results of Bioassay 1608 dataset in experiment 2.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinyan Li; Lian-sheng Liu; Simon Fong; Raymond K. Wong; Sabah Mohammed; Jinan Fiaidhi; Yunsick Sung; Kelvin K. L. Wong (2023). Results of Bioassay 1608 dataset in experiment 2. [Dataset]. http://doi.org/10.1371/journal.pone.0180830.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0180830.t011
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jinyan Li; Lian-sheng Liu; Simon Fong; Raymond K. Wong; Sabah Mohammed; Jinan Fiaidhi; Yunsick Sung; Kelvin K. L. Wong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of Bioassay 1608 dataset in experiment 2.

Facebook

Twitter

Click to copy link

Link copied

Cite

Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari (2025). The definition of a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0317396.t002

The definition of a confusion matrix.

Explore at:

32 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0317396.t002

Dataset updated

Feb 10, 2025

Dataset provided by

PLOS ONE

Authors

Javad Hemmatian; Rassoul Hajizadeh; Fakhroddin Nazari

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In recent years, the challenge of imbalanced data has become increasingly prominent in machine learning, affecting the performance of classification algorithms. This study proposes a novel data-level oversampling method called Cluster-Based Reduced Noise SMOTE (CRN-SMOTE) to address this issue. CRN-SMOTE combines SMOTE for oversampling minority classes with a novel cluster-based noise reduction technique. In this cluster-based noise reduction approach, it is crucial that samples from each category form one or two clusters, a feature that conventional noise reduction methods do not achieve. The proposed method is evaluated on four imbalanced datasets (ILPD, QSAR, Blood, and Maternal Health Risk) using five metrics: Cohen’s kappa, Matthew’s correlation coefficient (MCC), F1-score, precision, and recall. Results demonstrate that CRN-SMOTE consistently outperformed the state-of-the-art Reduced Noise SMOTE (RN-SMOTE), SMOTE-Tomek Link, and SMOTE-ENN methods across all datasets, with particularly notable improvements observed in the QSAR and Maternal Health Risk datasets, indicating its effectiveness in enhancing imbalanced classification performance. Overall, the experimental findings indicate that CRN-SMOTE outperformed RN-SMOTE in 100% of the cases, achieving average improvements of 6.6% in Kappa, 4.01% in MCC, 1.87% in F1-score, 1.7% in precision, and 2.05% in recall, with setting SMOTE’s neighbors’ number to 5.

Clear search

Close search

Google apps

Main menu

The definition of a confusion matrix.

smote_smote

Dataset

Contents

Classification result classifiers using TF with SMOTE.

Data_Sheet_1_Effect of De-noising by Wavelet Filtering and Data Augmentation...

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, W Philip Kegelmeyer...

Data from: High impact bug report identification with imbalanced learning...

Indian Liver Patient Dataset (ILPD)

Dataset

Contents

Amazon Kaggle SMOTE with PCA

Dataset

Contents

Supplementary tables. A hybrid resampling algorithms SMOTE and ENN based...

Synthetic oversampling for credit card default prediction

Korean Voice Phishing Detection Dataset with Multilingual Back-Translation...

Mushroom Dataset SMOTE

Dataset

Contents

Credit Card Fraud Detection using SMOTE

Smote_CHUHOA

Dataset

Contents

Data from: Dataset for classification of signaling proteins based on...

Dataset of book subjects where books equals 'His Captain's hand on his...

TPS - Mar 2021 - Ordinal+SMOTE

Dataset

Contents

DermaEvolve - Skin Disease Pred. - SMOTE Balanced

DermaEvolve Dataset

Overview

Sources

Categories

Common Categories:

Rare Diseases (Extended):

Dataset Characteristics

Acknowledgements

Usage

NOTE :

NSL KDD Smote

Dataset

Contents

Results of Bioassay 1608 dataset in experiment 2.

The definition of a confusion matrix.