2 datasets found
  1. h

    sms_spam

    • huggingface.co
    Updated Aug 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset authored and provided by
    UC Irvine
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.

  2. h

    spam-messages

    • huggingface.co
    Updated Aug 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Shenoda (2025). spam-messages [Dataset]. https://huggingface.co/datasets/mshenoda/spam-messages
    Explore at:
    Dataset updated
    Aug 24, 2025
    Authors
    Michael Shenoda
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    The dataset is composed of messages labeled by ham or spam, merged from three data sources:

    SMS Spam Collection https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset Telegram Spam Ham https://huggingface.co/datasets/thehamkercat/telegram-spam-ham/tree/main Enron Spam: https://huggingface.co/datasets/SetFit/enron_spam/tree/main (only used message column and labels)

    The prepare script for enron is available at… See the full description on the dataset page: https://huggingface.co/datasets/mshenoda/spam-messages.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam

sms_spam

ucirvine/sms_spam

SMS Spam Collection Data Set

Explore at:
54 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset authored and provided by
UC Irvine
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for [Dataset Name]

  Dataset Summary

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

  Supported Tasks and Leaderboards

[More Information Needed]

  Languages

English

  Dataset Structure





  Data Instances

[More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.

Search
Clear search
Close search
Google apps
Main menu