2 datasets found

h
sms_spam
huggingface.co
Updated Aug 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset authored and provided by
UC Irvine
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for [Dataset Name]

Dataset Summary

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

English

Dataset Structure Data Instances

[More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.
h
spam-messages
huggingface.co
Updated Aug 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Shenoda (2025). spam-messages [Dataset]. https://huggingface.co/datasets/mshenoda/spam-messages
Explore at:
Dataset updated
Aug 24, 2025
Authors
Michael Shenoda
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

The dataset is composed of messages labeled by ham or spam, merged from three data sources:

SMS Spam Collection https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset Telegram Spam Ham https://huggingface.co/datasets/thehamkercat/telegram-spam-ham/tree/main Enron Spam: https://huggingface.co/datasets/SetFit/enron_spam/tree/main (only used message column and labels)

The prepare script for enron is available at… See the full description on the dataset page: https://huggingface.co/datasets/mshenoda/spam-messages.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam

sms_spam

ucirvine/sms_spam

SMS Spam Collection Data Set

Explore at:

54 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 28, 2023

Dataset authored and provided by

UC Irvine

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for [Dataset Name]

  Dataset Summary

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

  Supported Tasks and Leaderboards

[More Information Needed]

  Languages

English

  Dataset Structure





  Data Instances

[More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.

Clear search

Close search

Google apps

Main menu

sms_spam

spam-messages

sms_spamSee More Versions

ucirvine/sms_spam

SMS Spam Collection Data Set

sms_spam