15 datasets found

SMS Spam Collection Dataset
kaggle.com
opendatalab.com
zip
Updated Dec 2, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). SMS Spam Collection Dataset [Dataset]. https://www.kaggle.com/uciml/sms-spam-collection-dataset
Explore at:
zip(215934 bytes)Available download formats
Dataset updated
Dec 2, 2016
Dataset authored and provided by
UCI Machine Learning
Description
Context

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.

Content

The files contain one message per line. Each line is composed by two columns: v1 contains the label (ham or spam) and v2 contains the raw text.

This corpus has been collected from free or free for research sources at the Internet:

-> A collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site. This is a UK forum in which cell phone users make public claims about SMS spam messages, most of them without reporting the very spam message received. The identification of the text of spam messages in the claims is a very hard and time-consuming task, and it involved carefully scanning hundreds of web pages. The Grumbletext Web site is: [Web Link]. -> A subset of 3,375 SMS randomly chosen ham messages of the NUS SMS Corpus (NSC), which is a dataset of about 10,000 legitimate messages collected for research at the Department of Computer Science at the National University of Singapore. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The NUS SMS Corpus is avalaible at: [Web Link]. -> A list of 450 SMS ham messages collected from Caroline Tag's PhD Thesis available at [Web Link]. -> Finally, we have incorporated the SMS Spam Corpus v.0.1 Big. It has 1,002 SMS ham messages and 322 spam messages and it is public available at: [Web Link]. This corpus has been used in the following academic researches:

Acknowledgements

The original dataset can be found here. The creators would like to note that in case you find the dataset useful, please make a reference to previous paper and the web page: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ in your papers, research, etc.

We offer a comprehensive study of this corpus in the following paper. This work presents a number of statistics, studies and baseline results for several machine learning methods.

Almeida, T.A., GÃ³mez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011.

Inspiration

Can you use this dataset to build a prediction model that will accurately classify which texts are spam?
h
scam-detection-data
huggingface.co
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherwin Larsen Alva (2025). scam-detection-data [Dataset]. https://huggingface.co/datasets/SparkyPilot/scam-detection-data
Explore at:
Dataset updated
Mar 25, 2025
Authors
Sherwin Larsen Alva
Description
Using the Dataset The dataset used for training and evaluation is available here. You can load it using the datasets library: from datasets import load_dataset

Load the dataset

dataset = load_dataset("SparkyPilot/scam-detection-data")

Explore the dataset

print(dataset["train"][0]) # Print the first example in the training set

The link for the datasets taken from different sources are mentioned down here -

spam.csv [https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset]… See the full description on the dataset page: https://huggingface.co/datasets/SparkyPilot/scam-detection-data.
S
SMS Firewall Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). SMS Firewall Report [Dataset]. https://www.marketresearchforecast.com/reports/sms-firewall-29332
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Mar 7, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The SMS Firewall market, valued at $3602.1 million in 2025, is experiencing robust growth driven by increasing concerns over SMS-based threats like spam, phishing, and malware. The rising adoption of mobile banking and e-commerce fuels demand for robust security solutions, making SMS firewalls a critical component of overall cybersecurity strategies. Key application segments include BFSI (Banking, Financial Services, and Insurance), where secure transactions are paramount, and the burgeoning entertainment and retail sectors, reliant on SMS-based communications for promotions and customer engagement. The market's segmentation also encompasses A2P (Application-to-Person) and P2A (Person-to-Application) messaging, reflecting the diverse ways businesses and individuals utilize SMS. Technological advancements, such as AI-powered threat detection and improved filtering techniques, further enhance the effectiveness of SMS firewalls and contribute to market expansion. Geographic growth is expected to be diverse, with North America and Europe holding significant market share initially due to high technological adoption and stringent regulatory frameworks. However, rapid digitalization in Asia-Pacific and the Middle East & Africa presents substantial growth opportunities in the coming years. Competition in the market is intense, with established players like Tata Communications and Sinch vying with newer entrants for market share. This competitive landscape fosters innovation and drives down prices, making SMS firewall solutions increasingly accessible to a broader range of businesses and organizations. The forecast period (2025-2033) anticipates continued market expansion, fuelled by evolving threats and increased regulatory scrutiny. Factors such as the increasing sophistication of malicious SMS campaigns, the rise of 5G technology (which may increase SMS vulnerabilities), and evolving privacy regulations will continue to shape market dynamics. While the precise CAGR is unavailable, a conservative estimate considering industry growth trends and the inherent need for robust security in an increasingly digital world would place the CAGR in the range of 12-15% annually. This growth projection reflects not only the increasing demand for SMS firewalls but also the ongoing development of more sophisticated solutions capable of countering increasingly complex threats. The market is expected to see significant consolidation, with larger players acquiring smaller firms to expand their product portfolios and geographic reach.
R
Spam Dataset
universe.roboflow.com
zip
Updated Oct 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
spam (2024). Spam Dataset [Dataset]. https://universe.roboflow.com/spam-jwkhh/spam-qttjo/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Oct 18, 2024
Dataset authored and provided by
spam
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Spam

## Overview Spam is a dataset for object detection tasks - it contains Text annotations for 300 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
f
Spammer behavior used in literature.
plos.figshare.com
xls
Updated Feb 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed (2025). Spammer behavior used in literature. [Dataset]. http://doi.org/10.1371/journal.pone.0313628.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313628.t001
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.
m
Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection...
data.mendeley.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saieef Sunny (2024). Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection Dataset [Dataset]. http://doi.org/10.17632/sz5558wrd4.3
Explore at:
Unique identifier
https://doi.org/10.17632/sz5558wrd4.3
Dataset updated
Jul 16, 2024
Authors
Saieef Sunny
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Overview The Bangla Multilabel Cyberbully, Sexual Harassment, Threat, and Spam Detection Dataset is designed to facilitate the development of machine learning models to detect and classify various types of abusive content in Bangla social media text. This dataset contains a collection of comments annotated for multiple types of abuse, making it suitable for multilabel classification tasks. It aims to support research and development in natural language processing (NLP) to enhance online safety and moderate harmful content on Bangla language social media platforms.

Purpose 1. Train and evaluate machine learning models for detection of cyberbullying, sexual harassment, religious hate speech, threats, and spam in Bangla comments. 2. Support research in NLP and machine learning focused on Bangla, a low-resource language. 3. Aid in developing automated moderation systems for social media platforms to ensure safe and respectful communication.

Data Collection Initially, we collected around 30,000 comments from social media platforms like Facebook and TikTok. These comments were in Bangla, English, and Banglish (Bangla written using English characters). Since our research focuses on Bangla abusive text detection, we refined the dataset through the following steps:

We filtered out all comments written in English to focus on the Bangla text.

To ensure data quality, We eliminated duplicate entries and rows with missing or null values.

We removed any remaining English characters and both Bangla and English numerical values to ensure the analysis was based solely on Bangla text.

After these steps, we obtained a final dataset of 12,557 comments. Each comment was manually labeled into five classes: bully, sexual, religious, threat, and spam. This dataset supports multi-class labeling, meaning a comment can simultaneously belong to more than one class.

Dataset Columns 1. Gender: Indicates the gender of the person who received the bullying. 2. Profession: Indicates the profession of the person who received the bullying. 3. Comment: Contains the text of the comment in Bangla. 4. Bully: Binary label indicating whether the comment contains bullying content. (0 for no, 1 for yes) 5. Sexual: Binary label indicating whether the comment contains sexual harassment content. (0 for no, 1 for yes) 6. Religious: Binary label indicating whether the comment contains religious hate speech. (0 for no, 1 for yes) 7. Threat: Binary label indicating whether the comment contains threats. (0 for no, 1 for yes) 8. Spam: Binary label indicating whether the comment is considered spam. (0 for no, 1 for yes)

Applications 1. Training and testing machine learning models for multilabel classification. 2. Research on natural language processing (NLP) and cyberbullying detection in low-resource languages like Bangla. 3. Developing automated systems for monitoring and moderating online content on social media platforms to ensure safe and respectful communication.
Data from: Image dataset to train a deep learning model to decode Leetspeak...
zenodo.org
research.science.eus
+2more
zip
Updated Mar 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iñaki Velez de Mendizabal; Iñaki Velez de Mendizabal; Xabier Vidriales; Vitor Basto Fernandes; Vitor Basto Fernandes; Enaitz Ezpeleta; Enaitz Ezpeleta; José Ramón Méndez; José Ramón Méndez; Urko Zurutuza; Urko Zurutuza; Xabier Vidriales (2022). Image dataset to train a deep learning model to decode Leetspeak obfuscated characters [Dataset]. http://doi.org/10.5281/zenodo.6373423
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6373423
Dataset updated
Mar 22, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Iñaki Velez de Mendizabal; Iñaki Velez de Mendizabal; Xabier Vidriales; Vitor Basto Fernandes; Vitor Basto Fernandes; Enaitz Ezpeleta; Enaitz Ezpeleta; José Ramón Méndez; José Ramón Méndez; Urko Zurutuza; Urko Zurutuza; Xabier Vidriales
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains an image database (18,981 images) that could be used to train a deep learning model to accurately detect characters. We have successfully used it to create a model that identifies characters encoded using LeetSpeak. The original dataset can be found in the Mondragon Unibertsitatea Repository -- https://gitlab.danz.eus/datasharing/ski4spam

The training dataset consists of:

- Alphabetic letters (a-z) written using different fonts and styles (regular, cursive, bold, cursive+bold)

- Handwritten letters: English handwriting from the Chars74k dataset [2] which is available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.
f
Feature selection using PCA.
plos.figshare.com
xls
Updated Feb 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Feature selection using PCA. [Dataset]. https://plos.figshare.com/articles/dataset/Feature_selection_using_PCA_/28362299
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313628.t006
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.
h
turkishSMS-ds
huggingface.co
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alper Kürşat Uysal (2024). turkishSMS-ds [Dataset]. https://huggingface.co/datasets/akuysal/turkishSMS-ds
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2023
Authors
Alper Kürşat Uysal
Description
Dataset Card for "turkishSMS-ds"

The dataset was utilized in the following study. It consists of Turkish SMS spam and legitimate data. Uysal, A. K., Gunal, S., Ergin, S., & Gunal, E. S. (2013). The impact of feature extraction and selection on SMS spam filtering. Elektronika ir Elektrotechnika, 19(5), 67-72. More Information needed
o
Desights: Discord Community Dynamics - Analysis by Bryce
market.oceanprotocol.com
Updated Mar 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Desights User (2024). Desights: Discord Community Dynamics - Analysis by Bryce [Dataset]. https://market.oceanprotocol.com/asset/did:op:b0471a0985ff1dba5e4384a87705beb53afa3925310536eec0b86ac8fdde78f8
Explore at:
Dataset updated
Mar 12, 2024
Dataset authored and provided by
Desights User
Description
This is a submission for Challenge #22 by Desights User

Click here for Challenge Details Note: This submission is in REVIEW state and is only accessible by Challenge Reviewers. So you might get errors when you try to download this asset directly from Ocean Market.

Submission Description

Replicated from README.

How to Use This Repository

Main Files

The main submission files are in the home directory:

Discord Community Dynamics - Analysis by Bryce.html - This HTML versio is the best file to use. My submission uses Highcharts for interactive charts, so this version will allow limited drilldown options.

Discord Community Dynamics - Analysis by Bryce.pdf: In case there are problems with the HTML version, I have provided this PDF version. It is not interactive and the formatting will be a bit worse.

Discord Community Dynamics - Analysis by Bryce.qmd: This Quarto document can be viewed to understand the code behind the exhibits. The code has been hidden in the other versions to remove complexity and put the focus squarely on results.

Support Files

Various support files were also used to do analysis. These are saved in the support/ folder. Due to limited time, these won't be super user-friendly unfortunately. I also moved them recently and have not refactored so they won't run without fixing file location and working directory issues.

Data Files

I have removed the data files to keep the submission file size small.

All the files can be built using support scripts, starting from only the contest dataset "Ocean Discord Data Challenge Dataset.csv". That said, please contact me (superchordate@gmail.com) if you'd like the full repository including the data files.

Data Sources

$OCEAN price and volume information are taken from the www.cryptocurrencychart.com API. External pretrained models used include mrm8488/bert-tiny-finetuned-sms-spam-detection and mshenoda/roberta-spam.

Author

Bryce Chamberlain superchordate@gmail.com https://www.bryce-chamberlain.com
f
Feature selection using XGB.
plos.figshare.com
xls
Updated Feb 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed (2025). Feature selection using XGB. [Dataset]. http://doi.org/10.1371/journal.pone.0313628.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313628.t005
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.
m
Global Text Analytics (Mining) Software Market Size, Trends and Projections
marketresearchintellect.com
Updated Mar 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Intellect (2025). Global Text Analytics (Mining) Software Market Size, Trends and Projections [Dataset]. https://www.marketresearchintellect.com/product/text-analytics-mining-software-market/
Explore at:
Dataset updated
Mar 11, 2025
Dataset authored and provided by
Market Research Intellect
License
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Area covered
Global
Description
The size and share of the market is categorized based on Type (On-Premise, Cloud-Based) and Application (Data Analysis and Forecasting, Fraud-Spam Detection, Intelligence and Law Enforcement, Customer Relationship Management (CRM), Others) and geographical regions (North America, Europe, Asia-Pacific, South America, and Middle-East and Africa).
f
Evaluation matrices of the proposed approach.
plos.figshare.com
xls
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed (2025). Evaluation matrices of the proposed approach. [Dataset]. http://doi.org/10.1371/journal.pone.0313628.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313628.t009
Dataset updated
Feb 6, 2025
Dataset provided by
PLOS ONE
Authors
Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.
m
Text Analytics Market Size, Share, Trends, Scope And Forecast
marketresearchintellect.com
Updated Nov 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Intellect® | Market Analysis and Research Reports (2021). Text Analytics Market Size, Share, Trends, Scope And Forecast [Dataset]. https://www.marketresearchintellect.com/product/global-text-analytics-market-size-forecast/
Explore at:
Dataset updated
Nov 24, 2021
Dataset authored and provided by
Market Research Intellect® | Market Analysis and Research Reports
License
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Area covered
Global
Description
The market size of the Text Analytics Market is categorized based on Type (On-Premise, Cloud-Based) and Application (Data Analysis & Forecasting, Fraud/Spam Detection, Intelligence & Law Enforcement, Customer Relationship Management (CRM), Other) and geographical regions (North America, Europe, Asia-Pacific, South America, and Middle-East and Africa).

This report provides insights into the market size and forecasts the value of the market, expressed in USD million, across these defined segments.

Global Text Content Moderation Solution Market Research Report: By...

wiseguyreports.com

Updated Mar 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2025). Global Text Content Moderation Solution Market Research Report: By Technology (Machine Learning, Natural Language Processing, Artificial Intelligence, Rule-Based Systems), By Deployment Type (Cloud-Based, On-Premise, Hybrid), By End User (Enterprises, Social Media Platforms, E-commerce Platforms, Gaming Platforms), By Application (Content Moderation, Spam Detection, Sentiment Analysis, User-Generated Content Monitoring) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/text-content-moderation-solution-market

Explore at:

Dataset updated

Mar 21, 2025

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	4.65(USD Billion)
MARKET SIZE 2024	5.19(USD Billion)
MARKET SIZE 2032	12.5(USD Billion)
SEGMENTS COVERED	Technology, Deployment Type, End User, Application, Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	rising regulatory compliance demands, increasing user-generated content, enhanced AI moderation technologies, growing concerns over online safety, demand for multilingual support
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Salesforce, Facebook, Verint, Microsoft, Google, Sprinklr, OpenAI, Twitter, IBM, Dynatrace, Clarifai, Cision, Sift, Hootsuite, AWS
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	AI-driven moderation technologies, Increased demand from social media platforms, Expansion in e-commerce content moderation, Rising need for compliance solutions, Growth in multilingual moderation services
COMPOUND ANNUAL GROWTH RATE (CAGR)	11.6% (2025 - 2032)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

UCI Machine Learning (2016). SMS Spam Collection Dataset [Dataset]. https://www.kaggle.com/uciml/sms-spam-collection-dataset

SMS Spam Collection Dataset

Collection of SMS messages tagged as spam or legitimate

Explore at:

57 scholarly articles cite this dataset (View in Google Scholar)

zip(215934 bytes)Available download formats

Dataset updated

Dec 2, 2016

Dataset authored and provided by

UCI Machine Learning

Description

Context

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.

Content

The files contain one message per line. Each line is composed by two columns: v1 contains the label (ham or spam) and v2 contains the raw text.

This corpus has been collected from free or free for research sources at the Internet:

-> A collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site. This is a UK forum in which cell phone users make public claims about SMS spam messages, most of them without reporting the very spam message received. The identification of the text of spam messages in the claims is a very hard and time-consuming task, and it involved carefully scanning hundreds of web pages. The Grumbletext Web site is: [Web Link]. -> A subset of 3,375 SMS randomly chosen ham messages of the NUS SMS Corpus (NSC), which is a dataset of about 10,000 legitimate messages collected for research at the Department of Computer Science at the National University of Singapore. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The NUS SMS Corpus is avalaible at: [Web Link]. -> A list of 450 SMS ham messages collected from Caroline Tag's PhD Thesis available at [Web Link]. -> Finally, we have incorporated the SMS Spam Corpus v.0.1 Big. It has 1,002 SMS ham messages and 322 spam messages and it is public available at: [Web Link]. This corpus has been used in the following academic researches:

Acknowledgements

The original dataset can be found here. The creators would like to note that in case you find the dataset useful, please make a reference to previous paper and the web page: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ in your papers, research, etc.

We offer a comprehensive study of this corpus in the following paper. This work presents a number of statistics, studies and baseline results for several machine learning methods.

Almeida, T.A., GÃ³mez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011.

Inspiration

Can you use this dataset to build a prediction model that will accurately classify which texts are spam?

Clear search

Close search

Google apps

Main menu

SMS Spam Collection Dataset

Context

Content

Acknowledgements

Inspiration

scam-detection-data

Load the dataset

Explore the dataset

SMS Firewall Report

Spam Dataset

Spam

Spammer behavior used in literature.

Bangla Multilabel Cyberbully, Sexual Harrasment, Threat and Spam Detection...

Data from: Image dataset to train a deep learning model to decode Leetspeak...

Feature selection using PCA.

turkishSMS-ds

Desights: Discord Community Dynamics - Analysis by Bryce

Feature selection using XGB.

Global Text Analytics (Mining) Software Market Size, Trends and Projections

Evaluation matrices of the proposed approach.

Text Analytics Market Size, Share, Trends, Scope And Forecast

Global Text Content Moderation Solution Market Research Report: By...

SMS Spam Collection DatasetSee More Versions

Collection of SMS messages tagged as spam or legitimate

Context

Content

Acknowledgements

Inspiration

SMS Spam Collection Dataset