100+ datasets found
  1. Spam share of global email traffic 2011-2023

    • statista.com
    • ai-chatbox.pro
    Updated Sep 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Spam share of global email traffic 2011-2023 [Dataset]. https://www.statista.com/statistics/420400/spam-email-traffic-share-annual/
    Explore at:
    Dataset updated
    Sep 17, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2023, nearly 45.6 percent of all e-mails worldwide were identified as spam, down from almost 49 percent in 2022. While remaining a big part of the e-mail traffic, since 2011, the share of spam e-mails has decreased significantly. In 2023, the highest volume of spam e-mails was registered in May, approximately 50 percent of e-mail traffic worldwide.

  2. Highest number of spam e-mails sent daily 2024, by country

    • statista.com
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Highest number of spam e-mails sent daily 2024, by country [Dataset]. https://www.statista.com/statistics/1270488/spam-emails-sent-daily-by-country/
    Explore at:
    Dataset updated
    Dec 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 8, 2024
    Area covered
    Worldwide
    Description

    As of December 8, 2024, China and the United States were the countries with the highest number of spam emails sent within one day worldwide, with around 7.8 billion. Ranking third and fourth were India and the Japan, with around 7.6 billion. Internet and e-mail users around the world Between 2019 and 2024, the number of email users globally increased from 3.9 billion to 4.4 billion. Moreover, this number is expected to increase up to 4.8 billion in 2027. Considering the fact that China and India had the highest number of internet users in the world in 2023, with over 1.2 billion and 1.1 billion users respectively, e-mail usage is less popular in these countries than in the United States or Germany, for example. Most popular online activities in the U.S. Not only did the United States have the highest number of daily emails and spam emails sent as of October 2021, it was actually the most popular online activity among internet users in 2019. In fact, 90.9 percent of respondents said they were email users, more than search users, social network users, or digital video viewers.

  3. Spam: share of global e-mail traffic monthly 2014-2023

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Spam: share of global e-mail traffic monthly 2014-2023 [Dataset]. https://www.statista.com/statistics/420391/spam-email-traffic-share/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2014 - Dec 2023
    Area covered
    Worldwide
    Description

    Spam messages accounted for over **** percent of e-mail traffic in December 2023. Russia generated the largest share of unsolicited spam e-mails in 2022, with **** percent of global spam e-mails originating from the country. Spam worldwide It is almost impossible to think about e-mail without considering the issue of spam, which usually includes billions of promotional e-mails marketers send daily. As of January 2023, the United States had the highest number of spam e-mails sent daily. While many e-mail users believe such content belongs in their spam folder, marketing e-mails are generally harmless if annoying to the user. Malicious spam Phishing e-mails remain one of the primary attack vectors for cybercriminals. On average, around ** percent of businesses worldwide experience four to six successful cyber attacks in one year. Another ** percent said they became victims of more than ** bulk phishing attacks. More than half of the companies said these phishing attacks resulted in consumer or client data breaches.

  4. Data from: Spam Email

    • kaggle.com
    Updated Feb 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhitaza Jana (2022). Spam Email [Dataset]. https://www.kaggle.com/datasets/rhitazajana/spam-email
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rhitaza Jana
    Description

    Dataset

    This dataset was created by Rhitaza Jana

    Contents

  5. j

    Data from: Persuasion Sentences in Spam Email (PerSentSE)

    • portalcienciaytecnologia.jcyl.es
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jáñez-Martino, Francisco; Barrón-Cedeño, Alberto; ALAIZ-RODRÍGUEZ, ROCÍO; González-Castro, Víctor; Jáñez-Martino, Francisco; Barrón-Cedeño, Alberto; ALAIZ-RODRÍGUEZ, ROCÍO; González-Castro, Víctor (2025). Persuasion Sentences in Spam Email (PerSentSE) [Dataset]. https://portalcienciaytecnologia.jcyl.es/documentos/67a9c7c719544708f8c7246c
    Explore at:
    Dataset updated
    2025
    Authors
    Jáñez-Martino, Francisco; Barrón-Cedeño, Alberto; ALAIZ-RODRÍGUEZ, ROCÍO; González-Castro, Víctor; Jáñez-Martino, Francisco; Barrón-Cedeño, Alberto; ALAIZ-RODRÍGUEZ, ROCÍO; González-Castro, Víctor
    Description

    How to Access:

    To access this dataset, please contact Francisco Janez via email at francisco.janez@unileon.es. Access will be granted based on specific requests.

    Purpose:The PerSentSE corpus was developed to study persuasive techniques in spam emails. It includes 130 emails randomly selected from the SpamArchive2122 dataset, which contains over 20,000 spam emails in English.

    Methodology:

    Segmentation: Emails were divided into sentences using the NLTK library.

    Annotation: Eight persuasive techniques, along with a "non-persuasion" class, were identified. Two expert annotators labeled an initial subset of emails to measure inter-annotator agreement, achieving a final acceptable level (γ = 0.63).

    Corpus Statistics:

    Total sentences: 1,075

    Persuasive sentences: 216 (20.1%)

    Persuasion Distribution by Email Sections (Table 7):

    Subject lines: 35.59% persuasive, with an average of 1.62 techniques.

    Greeting section: 54.17% persuasive, averaging 1.46 techniques.

    Email body: 82.46% persuasive, with 5.51 techniques on average.

    Farewell section: 31.43% persuasive, averaging 1.45 techniques.

    Co-occurrence of Techniques (Figure 2):Some persuasive techniques frequently appeared together:

    Appeal to Fear/Prejudice with Loaded Language: 25 instances.

    Exaggeration/Minimization with Loaded Language: 24 instances.

    Appeal to Fear/Prejudice with Exaggeration/Minimization: 20 instances.

    Findings:The body section of emails concentrates the highest number of persuasive elements, contrary to earlier studies focusing on subject lines alone. This suggests that spam emails rely heavily on persuasive content in their main text.

  6. E Mail Spam Filter Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). E Mail Spam Filter Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-e-mail-spam-filter-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    E-Mail Spam Filter Market Outlook



    The global market size for e-mail spam filters is poised to grow from approximately USD 2.84 billion in 2023 to an estimated USD 5.97 billion by 2032, with a robust compound annual growth rate (CAGR) of 8.5%. This growth is driven by increasing cyber threats and the rising importance of securing communication channels.



    One of the primary growth factors for the e-mail spam filter market is the escalating number of cyberattacks and phishing scams. These attacks often infiltrate via spam emails, making it critical for organizations to implement robust spam filters to protect sensitive information. The sophistication of spam email tactics has evolved, necessitating advanced filtering solutions that can detect and block such threats effectively. Consequently, the demand for dynamic and intelligent spam filtering systems is on the rise.



    Furthermore, the growing regulatory demands for data protection and privacy act as significant drivers for this market. Regulations such as GDPR in Europe and CCPA in California mandate stringent measures to protect users' data, including the prevention of spam and phishing emails. Compliance with these regulations often requires the deployment of advanced spam filtering technologies, thereby propelling market growth. Companies are increasingly investing in these solutions to avoid hefty penalties and maintain customer trust.



    In addition to spam filters, Email Protection Software plays a crucial role in safeguarding communication channels from a myriad of cyber threats. These software solutions provide comprehensive protection by integrating features such as encryption, data loss prevention, and threat intelligence. With the increasing sophistication of cyberattacks, organizations are turning to email protection software to ensure the confidentiality and integrity of their communications. This software not only helps in blocking spam but also offers advanced threat detection capabilities, making it an indispensable tool for modern businesses aiming to secure their email infrastructure.



    Another crucial factor contributing to the market's expansion is the increasing adoption of cloud-based services. Cloud computing offers scalable solutions that can be easily integrated with existing email systems, providing efficient spam filtering capabilities without the need for significant upfront investments in hardware. This flexibility and cost-effectiveness make cloud-based spam filters particularly attractive to small and medium enterprises (SMEs), further driving market growth.



    Regionally, North America holds a significant share of the e-mail spam filter market, owing to the high adoption of advanced technologies and the presence of major industry players. The Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period, driven by the rapid digital transformation and increasing cyber threats in emerging economies like India and China. The stringent regulatory environment in Europe also ensures steady demand for spam filter solutions in this region.



    Component Analysis



    The e-mail spam filter market can be broadly segmented into two main components: software and services. The software segment encompasses the actual spam filtering applications that can be installed and integrated into email systems. These software solutions range from basic spam filters to advanced machine learning-based systems that can adapt to new threats. The demand for software solutions is driven by their ability to provide real-time protection against spam and phishing attacks, ensuring the security of organizational communication channels.



    On the other hand, the services segment includes managed services, consulting, and support services provided by vendors. Managed services are particularly popular among organizations that lack the in-house expertise to manage and update spam filters. These services often include regular updates, monitoring, and management of the spam filtering systems, ensuring optimal performance and protection. Consulting services help organizations choose the right spam filtering solutions and implement them effectively, while support services provide ongoing assistance to address any issues that may arise.



    The software segment is anticipated to hold a larger market share due to the increasing preference for advanced spam filtering solutions that can be customize

  7. Spam e-mail: leading countries of origin of spam 2023

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Spam e-mail: leading countries of origin of spam 2023 [Dataset]. https://www.statista.com/statistics/263086/countries-of-origin-of-spam/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, Russia ranked first by its share of unsolicited spam e-mails. Overall, **** percent of global spam e-mails originated from IPs in Russia. The United States ranked second, with **** percent. Mainland China followed, accounting for over ** percent of global unsolicited spam e-mails during the measured period.

  8. o

    Spam Mail Prediction Dataset

    • opendatabay.com
    .undefined
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Spam Mail Prediction Dataset [Dataset]. https://www.opendatabay.com/data/dataset/080d396c-0650-452b-9bef-d6bb3fa9366e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Fraud Detection & Risk Management
    Description

    The dataset consists of a collection of emails categorized into two major classes: spam and not spam. It is designed to facilitate the development and evaluation of spam detection or email filtering systems.

    The spam emails in the dataset are typically unsolicited and unwanted messages that aim to promote products or services, spread malware, or deceive recipients for various malicious purposes. These emails often contain misleading subject lines, excessive use of advertisements, unauthorized links, or attempts to collect personal information.

    The non-spam emails in the dataset are genuine and legitimate messages sent by individuals or organizations. They may include personal or professional communication, newsletters, transaction receipts, or any other non-malicious content.

    The dataset encompasses emails of varying lengths, languages, and writing styles, reflecting the inherent heterogeneity of email communication. This diversity aids in training algorithms that can generalize well to different types of emails, making them robust against different spammer tactics and variations in non-spam email content.

    Original Data Source: Spam Mail Prediction Dataset

  9. Spam Email Classification

    • kaggle.com
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somesh Sharma (2020). Spam Email Classification [Dataset]. https://www.kaggle.com/somesh24/spambase/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Kaggle
    Authors
    Somesh Sharma
    Description

    SPAM E-mail Database

    The “spam” concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography… Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word ‘george’ and the area code ‘650’ are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.

    Attribute Information:

    The last column denotes whether the e-mail was considered spam (1) or not (0), i.e. unsolicited commercial e-mail. Most of the attributes indicate whether a particular word or character was frequently occurring in the e-mail. The run-length attributes (55-57) measure the length of sequences of consecutive capital letters.

    For the statistical measures of each attribute, see the end of this file. Here are the definitions of the attributes:

    48 continuous real [0,100] attributes of type word_freq_WORD = percentage of words in the e-mail that match WORD, i.e. 100 * (number of times the WORD appears in the e-mail) / total number of words in e-mail. A “word” in this case is any string of alphanumeric characters bounded by non-alphanumeric characters or end-of-string.

    6 continuous real [0,100] attributes of type char_freq_CHAR = percentage of characters in the e-mail that match CHAR, i.e. 100 * (number of CHAR occurrences) / total characters in e-mail

    1 continuous real [1,…] attribute of type capital_run_length_average = average length of uninterrupted sequences of capital letters

    1 continuous integer [1,…] attribute of type capital_run_length_longest = length of longest uninterrupted sequence of capital letters

    1 continuous integer [1,…] attribute of type capital_run_length_total = sum of length of uninterrupted sequences of capital letters = total number of capital letters in the e-mail

    1 nominal {0,1} class attribute of type spam = denotes whether the e-mail was considered spam (1) or not (0), i.e. unsolicited commercial e-mail.

  10. Data from: Spam mail

    • kaggle.com
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAYAN DE (2024). Spam mail [Dataset]. https://www.kaggle.com/datasets/sayande01/spam-mail
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Kaggle
    Authors
    SAYAN DE
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by SAYAN DE

    Released under CC0: Public Domain

    Contents

  11. h

    all-scam-spam

    • huggingface.co
    Updated Sep 2, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fred Zhang (2002). all-scam-spam [Dataset]. https://huggingface.co/datasets/FredZhang7/all-scam-spam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2002
    Authors
    Fred Zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a large corpus of 42,619 preprocessed text messages and emails sent by humans in 43 languages. is_spam=1 means spam and is_spam=0 means ham. 1040 rows of balanced data, consisting of casual conversations and scam emails in ≈10 languages, were manually collected and annotated by me, with some help from ChatGPT.

      Some preprcoessing algorithms
    

    spam_assassin.js, followed by spam_assassin.py enron_spam.py

      Data composition
    
    
    
    
    
    
    
    
      Description
    

    To make the text… See the full description on the dataset page: https://huggingface.co/datasets/FredZhang7/all-scam-spam.

  12. Daily spam volume worldwide 2020-2021

    • statista.com
    Updated Jun 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Daily spam volume worldwide 2020-2021 [Dataset]. https://www.statista.com/statistics/1270424/daily-spam-volume-global/
    Explore at:
    Dataset updated
    Jun 26, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2020 - Sep 2021
    Area covered
    Worldwide
    Description

    Between October 2020 and September 2021, global daily spam volume reached its highest point in July 2021, with almost 283 billion spam emails from a total of 336.41 billion sent emails. As of August 2021, this number dropped to 65.50 billion. Towards September the average spam volume again saw an increase of 36 percent, reaching 88.88 billion from the total of 105.67 billion emails sent worldwide. The country, where the most emails were sent, was the U.S..

  13. Spam: share of global email traffic 2014-2021

    • digi.czlib.net
    Updated Apr 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2019). Spam: share of global email traffic 2014-2021 [Dataset]. http://digi.czlib.net/interlibSSO/goto/2/++9rs-shrs-9bnl/statistics/420391/spam-email-traffic-share/
    Explore at:
    Dataset updated
    Apr 30, 2019
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2014 - Mar 2021
    Area covered
    Worldwide
    Description

    Spam messages accounted for 45.1 percent of e-mail traffic in March 2021. During the most recently measured period, Russia generated the largest share of unsolicited spam e-mails with 23.52 percent of global spam volume. Despite its ubiquity, the global e-mail spam rate has actually been decreasing: the global annual spam e-mail rate in 2018 was 55 percent, down from 69 percent in 2012. Spam e-mail It is almost impossible to think about e-mail without considering the issue of spam. In 2019, 293.6 billion e-mails were sent and received on a daily basis. This includes billions of promotional e-mails sent by marketers every day. Whilst many e-mail users believe such content belongs in their spam folder, marketing e-mails are generally harmless, if annoying to the user. In 2018, the spam placement rate of commercial e-mails had declined to nine percent, down from 14 percent in 2017.

    Malicious spam Not all spam are benign promotional e-mails though. A significant portion of spam messages are of a more malicious nature, aiming to damage or hijack user systems. The most common variants of malicious spam worldwide include trojans, spyware, and ransomware.

  14. D

    Email Anti-spam Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Email Anti-spam Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-email-anti-spam-software-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Email Anti-spam Software Market Outlook



    The global email anti-spam software market size was valued at approximately USD 1.8 billion in 2023 and is projected to reach nearly USD 4.2 billion by 2032, growing at a compound annual growth rate (CAGR) of 9.7% during the forecast period. The significant growth factor driving this market is the increasing volume of spam emails, which has heightened the demand for robust email security solutions.



    One of the primary growth factors for the email anti-spam software market is the proliferation of spam and phishing attacks. As email remains a critical communication tool for both individuals and businesses, the rise in cyber threats has led to a greater need for advanced spam filtering solutions. Organizations are seeking sophisticated software capable of detecting and blocking malicious emails, thereby safeguarding sensitive information and protecting against data breaches. This demand is further fueled by regulatory requirements mandating stringent data protection measures.



    Another key growth factor is the increasing adoption of cloud-based solutions. Cloud deployment offers numerous advantages, including scalability, ease of integration, and cost-effectiveness. As more businesses migrate their operations to the cloud, the demand for cloud-based email anti-spam solutions is surging. These solutions are particularly appealing to small and medium enterprises (SMEs), which may lack the resources to invest in extensive on-premises infrastructure. Cloud solutions provide these organizations with robust security features, ensuring their email systems remain secure and compliant.



    Technological advancements in artificial intelligence (AI) and machine learning (ML) are also propelling market growth. Modern email anti-spam software leverages AI and ML algorithms to enhance the accuracy and efficiency of spam detection. These technologies enable the software to learn from patterns and behaviors, improving its ability to identify new and sophisticated spam tactics. The continuous evolution of AI and ML technologies promises to further strengthen the capabilities of email anti-spam solutions, driving their adoption across various sectors.



    The rise of Cloud-based Email Security solutions is revolutionizing the way organizations approach email protection. By leveraging cloud infrastructure, these solutions offer enhanced flexibility and scalability, allowing businesses to adapt quickly to changing security landscapes. Cloud-based systems are particularly advantageous for organizations with distributed teams, as they provide seamless access to security features from any location. Furthermore, they reduce the burden of maintaining on-premises hardware, enabling IT teams to focus on strategic initiatives rather than routine maintenance. As cyber threats evolve, cloud-based email security solutions continuously update to provide the latest protection, ensuring that organizations remain one step ahead of potential attacks. This adaptability and ease of use are driving more companies to transition to cloud-based models, aligning with broader digital transformation trends.



    Regionally, North America holds a substantial share of the email anti-spam software market. The presence of leading market players, coupled with high adoption rates of advanced cybersecurity solutions, drives this dominance. Additionally, stringent regulatory frameworks in the United States and Canada emphasize the need for robust email security, further boosting market growth in the region. Europe follows closely, with the General Data Protection Regulation (GDPR) playing a pivotal role in ensuring data security and privacy, thereby driving the demand for email anti-spam software.



    Component Analysis



    The email anti-spam software market is segmented by components into software and services. The software segment dominates the market, driven by the continuous need for effective spam detection and email security solutions. The software is designed to identify and block spam emails before they reach the userÂ’s inbox, leveraging a combination of filters, algorithms, and databases. This segment is witnessing continuous innovation, with vendors incorporating advanced AI and ML features to enhance detection accuracy and efficiency.



    Software solutions are further categorized into standalone and integrated solutions. Standalone software is specifically designed to target spam emails, while integrated solutions are

  15. Enron Fraud Email Dataset

    • kaggle.com
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advaith S Rao (2023). Enron Fraud Email Dataset [Dataset]. https://www.kaggle.com/datasets/advaithsrao/enron-fraud-email-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Advaith S Rao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.

    In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.

    Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.

    To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.

    To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki

    Label Annotation

    To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals

    Automated ML Labeling

    The following heuristics are used to annotate labels for Enron email data using the other two data sources,

    Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.

    Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.

    The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.

    If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.

    Email Signals

    Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,

    Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.

    Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.

    Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.

    Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.

    Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.

    Manual Inspection

    To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.

    Dataset Breakdown

    FraudNon-Fraud
    2327445090

    Citations

    Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015

    Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023

    CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008

  16. E

    E-Mail Spam Filter Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). E-Mail Spam Filter Report [Dataset]. https://www.datainsightsmarket.com/reports/e-mail-spam-filter-540969
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global email spam filter market is experiencing robust growth, driven by the escalating volume of spam emails and the increasing sophistication of phishing and malware attacks targeting individuals and organizations alike. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 12% between 2025 and 2033, reaching approximately $45 billion by 2033. This growth is fueled by several key factors. The widespread adoption of cloud-based solutions offers scalability, cost-effectiveness, and enhanced security features, contributing significantly to market expansion. Furthermore, stringent data privacy regulations across various regions are compelling businesses and government entities to invest heavily in robust email security solutions. The increasing prevalence of ransomware attacks and the associated financial losses are further driving demand for sophisticated email spam filters. Segmentation reveals a strong preference for cloud-based solutions over on-premises deployments, reflecting the overall shift towards cloud computing. The enterprise segment holds the largest market share, driven by the need for comprehensive security measures in large organizations. Geographically, North America and Europe currently dominate the market, although the Asia-Pacific region is poised for significant growth due to increasing internet penetration and rising awareness of cybersecurity threats. However, the market faces certain restraints. The high initial investment required for implementing advanced spam filtering solutions can be a barrier to entry for small and medium-sized enterprises (SMEs). Furthermore, the constant evolution of spam techniques necessitates continuous updates and upgrades to email filter technology, leading to ongoing operational costs. The complexity of managing and maintaining email security solutions can also deter some organizations, particularly those with limited IT resources. Despite these challenges, the overall market outlook remains positive, fueled by the persistent threat of spam and the growing need for robust email security. Key players like TitanHQ, Hornetsecurity, and others are leveraging technological advancements, such as AI and machine learning, to enhance the effectiveness of their solutions and cater to the evolving needs of the market. Competition is intense, driving innovation and price optimization within the sector.

  17. S

    Spam Blocking Software Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Spam Blocking Software Report [Dataset]. https://www.archivemarketresearch.com/reports/spam-blocking-software-564197
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global spam blocking software market is experiencing robust growth, driven by the escalating volume of spam emails and phishing attempts targeting individuals and organizations. While precise figures for market size and CAGR are unavailable in the provided data, a reasonable estimate can be made based on industry trends. Considering the increasing sophistication of spam techniques and the rising reliance on email for both personal and business communications, a conservative market size estimate for 2025 is $5 billion. Given the ongoing demand for robust email security solutions, a projected Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033 seems plausible. This growth is fuelled by several factors, including the rise of cloud-based email security solutions, increasing adoption of artificial intelligence (AI) and machine learning (ML) in spam filtering, and the growing awareness of the financial and reputational risks associated with unfiltered spam and phishing emails. Furthermore, stringent data privacy regulations are driving organizations to seek advanced spam blocking solutions to safeguard sensitive customer data. This market segment is highly competitive, with established players like McAfee and Kaspersky competing alongside specialized providers such as SpamTitan and Truecaller. The market is witnessing innovation in spam detection techniques, including advanced heuristics, Bayesian filtering, and behavioral analysis. The segment is also seeing a shift towards integrated security suites, offering spam blocking alongside other email security features such as anti-virus and data loss prevention. Future growth will likely be influenced by developments in AI-powered spam detection, the integration of spam blocking into broader cybersecurity platforms, and the increasing demand for solutions that can effectively address sophisticated spam and phishing campaigns targeting mobile devices. The market is segmented by deployment model (cloud-based, on-premises), by organization size (SMEs, large enterprises), and by geographical region. Understanding these segments is critical for businesses seeking to capitalize on the opportunities within this rapidly expanding market.

  18. D

    Email Anti Spam Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Email Anti Spam Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/email-anti-spam-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Email Anti Spam Software Market Outlook



    The global email anti spam software market size was valued at USD 2.5 billion in 2023 and is projected to reach USD 6.7 billion by 2032, growing at a CAGR of 11.5% during the forecast period. The growth of the market is primarily driven by the increasing need for enhanced cybersecurity measures, given the rising instances of email threats and cyber-attacks. Companies across various industries are rapidly adopting email anti-spam solutions to safeguard sensitive information and maintain operational integrity.



    The burgeoning number of email users worldwide is a key growth factor for the email anti-spam software market. As businesses and individuals increasingly rely on email for communication, the volume of spam and phishing emails has grown exponentially. This surge in unsolicited and often malicious emails necessitates robust anti-spam solutions. Furthermore, the rising economic losses and reputational damage caused by spam emails have compelled organizations to invest heavily in anti spam software. The need for sophisticated solutions that can filter out spam without hindering legitimate communications is paramount, fostering market growth.



    Technological advancements in artificial intelligence (AI) and machine learning (ML) are significantly enhancing the capabilities of email anti-spam software, driving market expansion. Modern anti-spam solutions leverage AI and ML to detect and filter out spam emails more accurately and efficiently than traditional methods. These technologies enable real-time analysis and continuous learning from new spam patterns, which enhances the softwareÂ’s ability to combat evolving threats. Additionally, advancements in cloud technology are facilitating the deployment of scalable and cost-effective anti-spam solutions, further contributing to market growth.



    Regulatory requirements and compliance standards are also playing a crucial role in the growth of the email anti-spam software market. Governments and regulatory bodies worldwide have implemented stringent regulations to protect personal data and ensure the privacy of email communications. Compliance with these regulations necessitates the adoption of robust email security solutions. For instance, the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States mandate stringent data protection measures, including the use of anti-spam software to prevent unauthorized access and data breaches.



    In addition to the advancements in AI and ML, the integration of Virus Filters into email anti-spam software is becoming increasingly important. Virus Filters play a crucial role in identifying and blocking malicious attachments and links that are often embedded in spam emails. These filters work by scanning incoming emails for known virus signatures and suspicious patterns, providing an additional layer of protection against malware and ransomware attacks. As cyber threats become more sophisticated, the ability to detect and neutralize viruses before they reach the user's inbox is vital for maintaining email security. The incorporation of Virus Filters enhances the overall effectiveness of anti-spam solutions, ensuring that organizations can safeguard their communications from a wide range of threats.



    Regionally, North America dominates the email anti-spam software market, accounting for the largest market share. This dominance can be attributed to the high adoption rate of advanced technologies, the presence of major market players, and stringent regulatory requirements. The Asia Pacific region is expected to witness significant growth during the forecast period, driven by the increasing number of internet users, rapid digitalization of businesses, and growing awareness about cybersecurity. Europe, Latin America, and the Middle East & Africa are also anticipated to contribute to market growth, albeit at varying growth rates, influenced by regional economic conditions and regulatory landscapes.



    Component Analysis



    The email anti-spam software market is segmented into software and services. The software segment is expected to hold the largest market share during the forecast period. This can be attributed to the increasing demand for advanced software solutions that can effectively filter and block spam emails. The software solutions are designed with sophisticated algorithms that can detect and eliminate spam based on various parameters, such as content analysis, sende

  19. Average results by industry

    • getresponse.com
    Updated Apr 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GetResponse (2017). Average results by industry [Dataset]. https://www.getresponse.com/resources/reports/email-marketing-benchmarks
    Explore at:
    Dataset updated
    Apr 5, 2017
    Dataset authored and provided by
    GetResponse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here, we’ve gathered email marketing benchmarks by industry. You can see how your average email open, click-through, click-to-open, unsubscribe, and spam complaint rates compare against other companies in your industry.

  20. Average results by country

    • getresponse.com
    Updated Apr 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GetResponse (2017). Average results by country [Dataset]. https://www.getresponse.com/resources/reports/email-marketing-benchmarks
    Explore at:
    Dataset updated
    Apr 5, 2017
    Dataset authored and provided by
    GetResponse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What are the average email marketing results in different countries? Here’s what we’ve found.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Spam share of global email traffic 2011-2023 [Dataset]. https://www.statista.com/statistics/420400/spam-email-traffic-share-annual/
Organization logo

Spam share of global email traffic 2011-2023

Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 17, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

In 2023, nearly 45.6 percent of all e-mails worldwide were identified as spam, down from almost 49 percent in 2022. While remaining a big part of the e-mail traffic, since 2011, the share of spam e-mails has decreased significantly. In 2023, the highest volume of spam e-mails was registered in May, approximately 50 percent of e-mail traffic worldwide.

Search
Clear search
Close search
Google apps
Main menu