59 datasets found
  1. h

    email-spam-classification

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data, email-spam-classification [Dataset]. https://huggingface.co/datasets/UniqueData/email-spam-classification
    Explore at:
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Email Spam Classification

    The dataset consists of a collection of emails categorized into two major classes: spam and not spam. It is designed to facilitate the development and evaluation of spam detection or email filtering systems. The spam emails in the dataset are typically unsolicited and unwanted messages that aim to promote products or services, spread malware, or deceive recipients for various malicious purposes. These emails often contain misleading subject lines… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/email-spam-classification.

  2. SMS Spam Detection Dataset

    • kaggle.com
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishakh Patel (2024). SMS Spam Detection Dataset [Dataset]. https://www.kaggle.com/datasets/vishakhdapat/sms-spam-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishakh Patel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description: In an era where communication is predominantly digital, SMS spam poses a significant challenge, cluttering inboxes and sometimes even posing security risks. Our "SMS Spam Detection Dataset" is tailored to empower machine learning enthusiasts, data scientists, and researchers to tackle this pervasive issue using the power of AI. This dataset is meticulously curated to provide a robust foundation for developing and benchmarking spam detection models.

    Dataset Overview: The dataset comprises two columns: 'Text' and 'Label', containing the SMS content and corresponding labels ('ham' for regular messages and 'spam' for unsolicited messages), respectively. With a diverse collection of messages, this dataset serves as an ideal playground for exploring various text processing and machine learning techniques.

    Potential Uses: Spam Detection Models: Use the dataset to train binary classification models capable of distinguishing between spam and ham messages with high accuracy. Natural Language Processing (NLP) Techniques: Experiment with different NLP methodologies, including tokenization, stemming, lemmatization, and the application of word embeddings or transformers to understand the nuances of SMS language. Feature Engineering: Explore how different features, such as message length, punctuation usage, and keyword frequency, can impact model performance. Model Benchmarking: Compare the effectiveness of various machine learning algorithms, from classical approaches like Naive Bayes and SVM to advanced deep learning models like LSTM and BERT.

    Challenges & Opportunities: While the dataset offers a straightforward binary classification task, the real challenge lies in dealing with the nuances of natural language, including slang, abbreviations, and the evolving nature of spam tactics. Innovators in the field can explore advanced techniques like transfer learning and semi-supervised models to push the boundaries of what's possible in spam detection.

  3. h

    spam-classification

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aniket (2024). spam-classification [Dataset]. https://huggingface.co/datasets/Anik3t/spam-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Aniket
    Description

    Anik3t/spam-classification dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. t

    Enron Spam Classification Dataset - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Enron Spam Classification Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/enron-spam-classification-dataset
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The Spam dataset is based on the Enron email data, specifically the BG section of spam emails and the Kaminski section of ham emails, combined into a dataset of 5000 emails for spam classification.

  5. Spam email classification

    • kaggle.com
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Mohamed (2023). Spam email classification [Dataset]. https://www.kaggle.com/datasets/yousefmohamed20/spam-email-detection/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Kaggle
    Authors
    Yousef Mohamed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a csv file containing related information of 5157 randomly picked email files and their respective labels for spam or not-spam classification. The csv file contains 5157 rows, each row for each email. There are 2 columns. The first column indicates Email category (spam or ham), The second column indicates the email sent.

  6. h

    generated-e-mail-spam

    • huggingface.co
    Updated Sep 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). generated-e-mail-spam [Dataset]. https://huggingface.co/datasets/UniqueData/generated-e-mail-spam
    Explore at:
    Dataset updated
    Sep 23, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The dataset consists of a CSV file containing of 300 generated email spam messages. Each row in the file represents a separate email message, its title and text. The dataset aims to facilitate the analysis and detection of spam emails. The dataset can be used for various purposes, such as training machine learning algorithms to classify and filter spam emails, studying spam email patterns, or analyzing text-based features of spam messages.

  7. Email spam classification dataset

    • kaggle.com
    Updated Jul 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inzamam Safi (2021). Email spam classification dataset [Dataset]. https://www.kaggle.com/inzamamsafi/email-spam-classification-dataset/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Inzamam Safi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Inzamam Safi

    Released under CC0: Public Domain

    Contents

  8. a

    Email.cz image spam dataset v1

    • academictorrents.com
    bittorrent
    Updated Dec 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vit Listik (2019). Email.cz image spam dataset v1 [Dataset]. https://academictorrents.com/details/06f2389082e9c034fa4a73aaee00131a27e388b6
    Explore at:
    bittorrent(2660566545)Available download formats
    Dataset updated
    Dec 30, 2019
    Dataset authored and provided by
    Vit Listik
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The problem with email image spam classification is known from the year 2005. There are several approaches to this task. Lately, those approaches use convolutional neural networks (CNN). We propose a novel approach to the image spam classification task. Our approach is based on CNN and transfer learning, namely Resnet v1 used for semantic feature extraction and one layer Feedforward Neural Network for classification. We have shown that this approach can achieve state-of-the-art performance on publicly available datasets. 99% F1-score on two datasets [dredze 2007, Princeton] and 96% F1-score on the combination of these datasets. Due to the availability of GPUs, this approach may be used for just-in-time classification in anti-spam systems handling huge amounts of emails. We have observed also that mentioned publicly available datasets are no longer representative. We overcame this limitation by using a much richer dataset from a one-week long real traffic of the freemail provider Email.

  9. t

    Spam Dataset - Dataset - LDM

    • service.tib.eu
    Updated Jan 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Spam Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/spam-dataset
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    The spam dataset is a dataset used for spam classification.

  10. sms-spam-classification

    • kaggle.com
    Updated Mar 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ali BOUZENA (2023). sms-spam-classification [Dataset]. https://www.kaggle.com/datasets/alibouzena/sms-spam-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ali BOUZENA
    Description

    This is a project that uses machine learning algorithms to classify emails/sms as either spam or not spam.

  11. t

    spambase - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). spambase - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/spambase
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset is a spam classification dataset containing 4,600 emails labeled as spam or not.

  12. Dataset for spam classification

    • kaggle.com
    Updated Dec 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayasya Batta (2021). Dataset for spam classification [Dataset]. https://www.kaggle.com/datasets/ayasyabatta/dataset-for-spam-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ayasya Batta
    Description

    Dataset

    This dataset was created by Ayasya Batta

    Contents

  13. A

    ‘Spam Text Message Classification’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Spam Text Message Classification’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-spam-text-message-classification-f627/34e9c337/?iid=000-133&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Spam Text Message Classification’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/team-ai/spam-text-message-classification on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    Coming Soon

    Content

    Coming Soon

    Acknowledgements

    Special thanks to; http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/

    Inspiration

    Coming soon

    --- Original source retains full ownership of the source dataset ---

  14. f

    Datasets Used in the experiment.

    • plos.figshare.com
    xls
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angom Buboo Singh; Khumanthem Manglem Singh (2023). Datasets Used in the experiment. [Dataset]. http://doi.org/10.1371/journal.pone.0291037.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Angom Buboo Singh; Khumanthem Manglem Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Image spam is a type of spam that contains text information inserted in an image file. Traditional classification systems based on feature engineering require manual extraction of certain quantitative and qualitative image features for classification. However, these systems are often not robust to adversarial attacks. In contrast, classification pipelines that use convolutional neural network (CNN) models automatically extract features from images. This approach has been shown to achieve high accuracies even on challenge datasets that are designed to defeat the purpose of classification. We propose a method for improving the performance of CNN models for image spam classification. Our method uses the concept of error level analysis (ELA) as a pre-processing step. ELA is a technique for detecting image tampering by analyzing the error levels of the image pixels. We show that ELA can be used to improve the accuracy of CNN models for image spam classification, even on challenge datasets. Our results demonstrate that the application of ELA as a pre-processing technique in our proposed model can significantly improve the results of the classification tasks on image spam datasets.

  15. f

    Network hyper-parameters.

    • plos.figshare.com
    xls
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angom Buboo Singh; Khumanthem Manglem Singh (2023). Network hyper-parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0291037.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Angom Buboo Singh; Khumanthem Manglem Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Image spam is a type of spam that contains text information inserted in an image file. Traditional classification systems based on feature engineering require manual extraction of certain quantitative and qualitative image features for classification. However, these systems are often not robust to adversarial attacks. In contrast, classification pipelines that use convolutional neural network (CNN) models automatically extract features from images. This approach has been shown to achieve high accuracies even on challenge datasets that are designed to defeat the purpose of classification. We propose a method for improving the performance of CNN models for image spam classification. Our method uses the concept of error level analysis (ELA) as a pre-processing step. ELA is a technique for detecting image tampering by analyzing the error levels of the image pixels. We show that ELA can be used to improve the accuracy of CNN models for image spam classification, even on challenge datasets. Our results demonstrate that the application of ELA as a pre-processing technique in our proposed model can significantly improve the results of the classification tasks on image spam datasets.

  16. r

    TREC05 spam corpus

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xi Li; David J. Miller; Zhen Xiang; George Kesidis (2024). TREC05 spam corpus [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdHJlYzA1LXNwYW0tY29ycHVz
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Xi Li; David J. Miller; Zhen Xiang; George Kesidis
    Description

    The dataset used in the paper is the TREC05 spam corpus, which contains 39,999 real ham and 52,790 spam emails.

  17. f

    CNN architectures.

    • plos.figshare.com
    xls
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angom Buboo Singh; Khumanthem Manglem Singh (2023). CNN architectures. [Dataset]. http://doi.org/10.1371/journal.pone.0291037.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Angom Buboo Singh; Khumanthem Manglem Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Image spam is a type of spam that contains text information inserted in an image file. Traditional classification systems based on feature engineering require manual extraction of certain quantitative and qualitative image features for classification. However, these systems are often not robust to adversarial attacks. In contrast, classification pipelines that use convolutional neural network (CNN) models automatically extract features from images. This approach has been shown to achieve high accuracies even on challenge datasets that are designed to defeat the purpose of classification. We propose a method for improving the performance of CNN models for image spam classification. Our method uses the concept of error level analysis (ELA) as a pre-processing step. ELA is a technique for detecting image tampering by analyzing the error levels of the image pixels. We show that ELA can be used to improve the accuracy of CNN models for image spam classification, even on challenge datasets. Our results demonstrate that the application of ELA as a pre-processing technique in our proposed model can significantly improve the results of the classification tasks on image spam datasets.

  18. Spam Text Message Classification

    • kaggle.com
    • hypi.ai
    Updated Aug 20, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Team AI (2017). Spam Text Message Classification [Dataset]. https://www.kaggle.com/datasets/team-ai/spam-text-message-classification/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2017
    Dataset provided by
    Kaggle
    Authors
    Team AI
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Coming Soon

    Content

    Coming Soon

    Acknowledgements

    Special thanks to; http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/

    Inspiration

    Coming soon

  19. Dataset for Email Spam Classification (NLP)

    • kaggle.com
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akalya Subramanian (2021). Dataset for Email Spam Classification (NLP) [Dataset]. https://www.kaggle.com/akalyasubramanian/dataset-for-email-spam-classification-nlp/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akalya Subramanian
    Description

    Dataset

    This dataset was created by Akalya Subramanian

    Contents

  20. Spam Images for Malicious Annotation Set (SIMAS)

    • zenodo.org
    application/gzip, bin +1
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Vukić; Maria Vukić; Emanuel Lacić; Emanuel Lacić; Denis Helic; Denis Helic (2025). Spam Images for Malicious Annotation Set (SIMAS) [Dataset]. http://doi.org/10.5281/zenodo.15423637
    Explore at:
    png, bin, application/gzipAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Vukić; Maria Vukić; Emanuel Lacić; Emanuel Lacić; Denis Helic; Denis Helic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SIMAS Dataset

    This archive includes the SIMAS dataset for fine-tuning models for MMS (Multimedia Messaging Service) image moderation. SIMAS is a balanced collection of publicly available images, manually annotated in accordance with a specialized taxonomy designed for identifying visual spam in MMS messages.

    Taxonomy for MMS Visual Spam

    The following table presents the definitions of categories used for classifying MMS images.

    Table 1: Category definitions

    Category Description
    Alcohol* Content related to alcoholic beverages, including advertisements and consumption.
    Drugs* Content related to the use, sale, or trafficking of narcotics (e.g., cannabis, cocaine,
    Firearms* Content involving guns, pistols, knives, or military weapons.
    Gambling* Content related to gambling (casinos, poker, roulette, lotteries).
    Sexual Content involving nudity, sexual acts, or sexually suggestive material.
    Tobacco* Content related to tobacco use and advertisements.
    Violence Content showing violent acts, self-harm, or injury.
    Safe All other content, including neutral depictions, products, or harmless cultural symbols

    Note: Categories marked with an asterisk are regulated in some jurisdictions and may not be universally restricted.

    Dataset Collection and Annotation

    Data Sources

    The SIMAS dataset combines publicly available images from multiple sources, selected to reflect the categories defined in our content taxonomy. Each image was manually reviewed by three independent annotators, with final labels assigned when at least two annotators agreed.

    The largest portion of the dataset (30.4%) originates from LAION-400M, a large-scale image-text dataset. To identify relevant content, we first selected a list of ImageNet labels that semantically matched our taxonomy. These labels were generated using GPT-4o in a zero-shot setting, using separate prompts per category. This resulted in 194 candidate labels, of which 88.7% were retained after manual review. The structure of the prompts used in this process is shown in the file gpt4o_imagenet_prompting_scheme.png, which illustrates a shared base prompt template applied across all categories. The fields category_definition, file_examples, and exceptions are specified per category. Definitions align with the taxonomy, while the file_examples column includes sample labels retrieved from the ImageNet label list. The exceptions field contains category-specific filtering instructions; a dash indicates no exceptions were specified.

    Another 25.1% of images were sourced from Roboflow, using open datasets such as:

    The NudeNet dataset contributes 11.4% of the dataset. We sampled 1,000 images from the “porn” category to provide visual coverage of explicit sexual content.

    Another 11.0% of images were collected from Kaggle, including:

    An additional 9.9% of images were retrieved from Unsplash, using keyword-based search queries aligned with each category in our taxonomy.

    Images from UnsafeBench make up 8.0% of the dataset. Since its original binary labels did not match our taxonomy, all samples were manually reassigned to the most appropriate category.

    Finally, 4.2% of images were gathered from various publicly accessible websites. These were primarily used to improve category balance and model generalization, especially in safe classes.

    All images collected from the listed sources have been manually reviewed by three independent annotators. Each image is then assigned to a category when at least two annotators reach consensus.

    Table 2: Distribution of images per public source and category in SIMAS dataset

    TypeCategoryLAIONRoboflowNudeNetKaggleUnsplashUnsafeBenchOtherTotal
    UnsafeAlcohol2903267010300
    UnsafeDrugs17211001381250
    UnsafeFirearms05902290620350
    UnsafeGambling1323800733918300
    UnsafeSexual2042103686500
    UnsafeTobacco04460043110500
    UnsafeViolence0289000110300
    SafeAlcohol1403500161396300
    SafeDrugs6749015721730250
    SafeFirearms173150314487350
    SafeGambling164201121120300
    SafeSexual2352213920948500
    SafeTobacco3516751381640500
    SafeViolence212203210422300
    AllAll1,5221,2535715514934022085,000

    Balancing

    To ensure semantic diversity and dataset balance, undersampling was performed on overrepresented categories using a CLIP-based embedding and k-means clustering strategy. This resulted in a final dataset containing 2,500 spam and 2,500 safe images, evenly distributed across all categories.

    Table 3: Distribution of images per category in SIMAS

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Unique Data, email-spam-classification [Dataset]. https://huggingface.co/datasets/UniqueData/email-spam-classification

email-spam-classification

UniqueData/email-spam-classification

Explore at:
Authors
Unique Data
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Email Spam Classification

The dataset consists of a collection of emails categorized into two major classes: spam and not spam. It is designed to facilitate the development and evaluation of spam detection or email filtering systems. The spam emails in the dataset are typically unsolicited and unwanted messages that aim to promote products or services, spread malware, or deceive recipients for various malicious purposes. These emails often contain misleading subject lines… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/email-spam-classification.

Search
Clear search
Close search
Google apps
Main menu