100+ datasets found
  1. Global number of e-mail phishing attacks 2022-2023

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Global number of e-mail phishing attacks 2022-2023 [Dataset]. https://www.statista.com/statistics/1493550/phishing-attacks-global-number/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2022 - Dec 2023
    Area covered
    Worldwide
    Description

    In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.

  2. Outcomes of successful phishing attacks in companies worldwide 2021-2023

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/statistics/1350723/consequences-phishing-attacks/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.

  3. Phishing: distribution of attacks 2023, by region

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Phishing: distribution of attacks 2023, by region [Dataset]. https://www.statista.com/statistics/266362/phishing-attacks-country/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was ***** percent. In the examined year, Peru was the second region, with an attack rate of nearly ** percent, while Taiwan followed with ***** percent.

  4. Phishing Email Dataset

    • kaggle.com
    zip
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naser Abdullah Alam (2024). Phishing Email Dataset [Dataset]. https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset
    Explore at:
    zip(80864554 bytes)Available download formats
    Dataset updated
    May 24, 2024
    Authors
    Naser Abdullah Alam
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    PHISHING EMAIL DATASET

    This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.

    Initial Datasets:

    • Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.

    • CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.

    Final Dataset:

    The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:

    • Approximately 82,500 emails
    • 42,891 spam emails
    • 39,595 legitimate emails

    This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.

    Please cite the following two articles if you are using this dataset:

    • Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619
  5. Share of global companies encountering cyberattacks 2021-2023, by type

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Share of global companies encountering cyberattacks 2021-2023, by type [Dataset]. https://www.statista.com/statistics/1376249/cyber-attack-global-firms-by-type/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    A 2022 survey of working adults and IT professionals worldwide revealed that bulk phishing attacks were the most common cyber incidents experienced by ** percent of organizations in 2022. Spear phishing ranked second, with ***** in **** respondents stating having encountered such incidents during the same year. Overall, between 2021 and 2022, there has been a decrease in the most common types of attacks.

  6. PhiUSIIL Phishing URL Dataset

    • kaggle.com
    • data.mendeley.com
    zip
    Updated Mar 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Prasad (2024). PhiUSIIL Phishing URL Dataset [Dataset]. https://www.kaggle.com/datasets/ndarvind/phiusiil-phishing-url-dataset
    Explore at:
    zip(15400969 bytes)Available download formats
    Dataset updated
    Mar 8, 2024
    Authors
    Arvind Prasad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed while constructing the dataset are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.

    Class Labels Label 1 corresponds to a legitimate URL, label 0 to a phishing URL

    Citations: Prasad, A., & Chandra, S. (2023). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 103545. doi: https://doi.org/10.1016/j.cose.2023.103545

  7. Global mobile phishing rate Q4 2022-Q2 2023, by region

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Global mobile phishing rate Q4 2022-Q2 2023, by region [Dataset]. https://www.statista.com/statistics/1306224/smishing-mobile-phishing-rate-worldwide-by-region/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the second quarter of 2023, smartphone users in North America encountered around ***** thousand phishing and malicious attempts worldwide, making it the region with the highest number of such incidents. Europe ranked second, with more than *** thousand phishing and malicious attempts.

  8. M

    Fraud Detection and Prevention Statistics By Secure Ways (2026)

    • scoop.market.us
    Updated Jan 22, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Scoop (2026). Fraud Detection and Prevention Statistics By Secure Ways (2026) [Dataset]. https://scoop.market.us/fraud-detection-and-prevention-statistics/
    Explore at:
    Dataset updated
    Jan 22, 2026
    Dataset authored and provided by
    Market.us Scoop
    License

    https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Fraud Detection and Prevention Statistics: Fraud Detection and Prevention (FDP) is a critical focus for organizations. Involving strategies and technologies to thwart deceptive activities aimed at financial gain.

    Fraud encompasses various forms, from identity theft to cybercrimes, and effective FDP is crucial for minimizing financial losses. Preserving reputation, adhering to legal obligations, and safeguarding data.

    Key components include data collection, statistical analysis, machine learning, fraud prevention measures, regulatory compliance, and integrating emerging technologies like AI and blockchain. Continuous improvement is vital in this dynamic field to stay ahead of evolving fraud tactics.

    https://scoop.market.us/wp-content/uploads/2023/12/Fraud-Detection-and-Prevention-Statistics.png" alt="Fraud Detection and Prevention Statistics" class="wp-image-40119">
  9. Phishing Website Dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv, zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra (2023). Phishing Website Dataset [Dataset]. http://doi.org/10.5281/zenodo.8041387
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.

    For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.

    In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.

    The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.

  10. Number of global phishing attacks Q3 2013- Q4 2024

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Number of global phishing attacks Q3 2013- Q4 2024 [Dataset]. https://www.statista.com/statistics/266155/number-of-phishing-attacks-worldwide/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the 4th quarter of 2024, over 989,000 unique phishing attacks were detected worldwide, representing a slight increase from the preceding quarter. By far, the number of unique phishing sites has seen the most significant jump between the second and the third quarters of 2020, from nearly 147,000 to approximately 572,000. This figure is based on the number of the unique base URLs of the phishing sites.

  11. Enron Fraud Email Dataset

    • kaggle.com
    zip
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advaith S Rao (2023). Enron Fraud Email Dataset [Dataset]. https://www.kaggle.com/datasets/advaithsrao/enron-fraud-email-dataset
    Explore at:
    zip(223918844 bytes)Available download formats
    Dataset updated
    Dec 28, 2023
    Authors
    Advaith S Rao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.

    In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.

    Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.

    To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.

    To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki

    Label Annotation

    To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals

    Automated ML Labeling

    The following heuristics are used to annotate labels for Enron email data using the other two data sources,

    Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.

    Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.

    The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.

    If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.

    Email Signals

    Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,

    Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.

    Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.

    Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.

    Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.

    Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.

    Manual Inspection

    To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.

    Dataset Breakdown

    FraudNon-Fraud
    2327445090

    Citations

    Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015

    Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023

    CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008

  12. Cybersecurity Incidents in India (2020–2024)

    • kaggle.com
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agile Yaswanth Sai Simha Reddy (2025). Cybersecurity Incidents in India (2020–2024) [Dataset]. https://www.kaggle.com/datasets/saisimha203/cybersecurity-cases-india
    Explore at:
    zip(11210 bytes)Available download formats
    Dataset updated
    Apr 22, 2025
    Authors
    Agile Yaswanth Sai Simha Reddy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    The dataset "Cybersecurity Cases in India" is a comprehensive collection of real-world cybersecurity incidents reported across various cities in India. The dataset encapsulates the financial loss, incident types, and categories, providing a detailed overview of the cybercrime landscape in one of the world’s largest digital economies. With over 1000 records, it spans incidents from 2020 to 2024, covering various types of cybercrimes such as phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, identity theft, and more. Each record captures important attributes of the incidents, such as the year, date of occurrence, amount lost in INR, the type of incident, the city in which it occurred, and the category of the affected entity (e.g., financial, personal, corporate).

    The dataset is structured to enable analysis of the trends in cybercrime over time, the financial impact of various cyberattacks, and the geographic distribution of incidents across Indian cities. It serves as a critical resource for cybersecurity professionals, policymakers, law enforcement agencies, and academic researchers seeking to understand the challenges posed by cybercrime in India and to identify strategies to combat these challenges.

    1. Dataset Purpose and Scope

    The dataset’s primary purpose is to provide an extensive, granular view of the nature and scope of cybersecurity incidents in India. It enables the analysis of the frequency, severity, and financial impact of cybercrimes across different types of attacks, cities, and time periods. As cybercrimes continue to rise globally, including in India, this dataset serves as an important tool for understanding the evolving threats and risks in cyberspace. Cybersecurity experts and analysts can leverage this dataset to identify patterns and trends, while government and law enforcement agencies can use it to devise more targeted interventions and preventive measures.

    India, with its large and growing digital footprint, is a prime target for cybercriminals. The country's rapidly expanding internet user base, coupled with increasing digital adoption in various sectors like finance, healthcare, education, and e-commerce, makes it an attractive target for cyberattacks. This dataset allows stakeholders to understand how cybercrime evolves in response to these dynamics.

    The dataset is a rich resource for understanding the following:

    • Incident Frequency: How often different types of cybercrimes are reported in various Indian cities.
    • Financial Impact: The monetary losses associated with each type of cybercrime.
    • Geographic Distribution: The prevalence of specific types of cybercrimes in particular cities or states.
    • Trend Analysis: How cybercrime has evolved over the years in terms of volume and impact.

    2. Dataset Structure and Variables

    The dataset includes the following key variables, each contributing valuable information to the analysis:

    • Year: The year in which the cybercrime incident occurred. This variable helps track the growth or decline of cybercrime incidents over time.
    • Date: The specific date of the cybercrime incident. This allows for time-series analysis of the data.
    • Amount_Lost_INR: The financial loss associated with the cybercrime incident, expressed in Indian Rupees (INR). This variable highlights the economic impact of each cyberattack and can be used to assess the severity of different incidents.
    • Incident_Type: The type of cybercrime incident. This can include phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, and identity theft. This variable is crucial for understanding which types of cybercrimes are most prevalent and how they differ in their impact.
    • City: The city where the incident occurred. This allows for the geographic analysis of cybercrime, helping to identify high-risk areas and cities where certain types of cyberattacks are more common.
    • Category: The category of the entity affected by the cybercrime, such as financial institutions, government bodies, corporations, educational institutions, or individuals. This variable provides insights into which sectors are more vulnerable to specific types of cyberattacks.

    3. Cybersecurity Threat Landscape in India

    India's digital transformation has made it a prime target for cybercriminals. As of 2023, India is one of the largest internet markets in the world, with over 600 million active internet users. The rapid growth of e-commerce, digital banking, social media, and government services has created new opportunities for cybercriminals to exploit vulnerabilities in digital systems. According to a 2022 report by the Indian Computer Emergency Response Team (CERT-In), India witnessed a significant increase in cybersecurity incidents, with millions of cyberattacks targeting individuals, b...

  13. D

    Spear Phishing Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Spear Phishing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/spear-phishing-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2025 - 2034
    Area covered
    Global
    Description

    Spear Phishing Protection Market Outlook



    The global spear phishing protection market size was valued at approximately USD 1.4 billion in 2023 and is anticipated to reach around USD 3.8 billion by 2032, growing at a compound annual growth rate (CAGR) of approximately 11.5% during the forecast period. This robust growth trajectory can be largely attributed to the increasing sophistication and frequency of spear phishing attacks, which are driving organizations across various sectors to invest in advanced protection solutions. As businesses become more digital, the attack surfaces expand, making them more vulnerable to such targeted cyber threats. Furthermore, the evolution of cyber threats, alongside stringent regulatory requirements for data protection, is pushing enterprises to adopt comprehensive spear phishing protection strategies.



    The growth of the spear phishing protection market is significantly driven by a heightened awareness of cybersecurity threats and the need for robust defense mechanisms. Spear phishing attacks, characterized by their targeted and personalized nature, have been growing in complexity, leading to severe financial losses and data breaches. Organizations are increasingly recognizing that traditional security measures are insufficient against these sophisticated attacks. As a result, there is a substantial uptick in the demand for specialized tools and services that offer real-time threat intelligence, advanced email filtering, and behavior analytics to preemptively identify and thwart potential phishing attempts. Moreover, the integration of artificial intelligence (AI) and machine learning (ML) in cybersecurity solutions is transforming the spear phishing protection landscape, allowing for more adaptive and predictive threat identification.



    Legislative requirements and compliance standards are also pivotal in propelling the market forward. Various governments and regulatory bodies have implemented stringent data protection laws that mandate organizations to have robust cybersecurity measures in place. Non-compliance can result in hefty fines and reputational damage, further emphasizing the need for effective spear phishing protection solutions. This regulatory environment is prompting organizations, especially those in highly regulated industries like banking and finance, healthcare, and government sectors, to prioritize their cybersecurity initiatives. Consequently, there is an increasing allocation of budgets towards enhancing cybersecurity infrastructure, with a significant portion dedicated to spear phishing protection.



    The proliferation of remote work and digital transformation initiatives has further catalyzed the growth of the spear phishing protection market. With the widespread adoption of remote working models, employees are accessing corporate networks from various locations and devices, thereby expanding the attack surface for cybercriminals. This shift necessitates a reevaluation of existing cybersecurity strategies to include comprehensive measures against spear phishing attacks. Organizations are investing in cloud-based security solutions that offer scalability, flexibility, and real-time monitoring capabilities. Moreover, as digital transformation continues to accelerate across industries, there is an increasing reliance on digital communication tools, which are often targeted by spear phishing attacks, thus highlighting the critical need for effective protection measures.



    Regionally, North America holds a substantial share of the spear phishing protection market, driven by the presence of major cybersecurity companies, a high rate of technology adoption, and stringent regulatory standards. The region's advanced IT infrastructure and significant investments in cybersecurity are key factors contributing to its dominance. Europe follows closely, with strong emphasis on data protection due to regulations like the General Data Protection Regulation (GDPR), which mandates robust cybersecurity frameworks. The Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, fueled by the rapid digitalization of economies, increasing cyber threats, and rising awareness about cybersecurity. Countries like China, India, and Japan are investing heavily in cybersecurity measures to protect their growing digital economies.



    Component Analysis



    The spear phishing protection market is segmented by component into solutions and services. Solutions encompass various technologies and software designed to detect, prevent, and mitigate spear phishing attacks. These solutions include advanced email security, endpoint protection, network security, thre

  14. Global financial phishing attack share 2023, by company type

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Global financial phishing attack share 2023, by company type [Dataset]. https://www.statista.com/statistics/420445/financial-organizations-most-affected-by-phishing/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, web services were targeted most by phishing attacks. They accounted for over ** percent of financial phishing attacks worldwide. Delivery company ranked second with nearly ** percent, while Global internet portals followed with ***** percent of the phishing attacks in the examined year.

  15. d

    Defence SA Fraud 2023-24 - Dataset - data.sa.gov.au

    • data.sa.gov.au
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Defence SA Fraud 2023-24 - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/defence-sa-fraud-2023-24
    Explore at:
    Dataset updated
    Jul 1, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    Defence SA Fraud 2023-24

  16. z

    Phishing and Benign Domain Dataset (DNS, IP, WHOIS/RDAP, TLS, GeoIP)

    • zenodo.org
    json
    Updated Apr 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radek Hranický; Radek Hranický; Adam Horák; Jan Polišenský; Jan Polišenský; Petr Pouč; Ondřej Ondryáš; Adam Horák; Petr Pouč; Ondřej Ondryáš (2024). Phishing and Benign Domain Dataset (DNS, IP, WHOIS/RDAP, TLS, GeoIP) [Dataset]. http://doi.org/10.5281/zenodo.8364668
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Apr 28, 2024
    Dataset provided by
    Zenodo
    Authors
    Radek Hranický; Radek Hranický; Adam Horák; Jan Polišenský; Jan Polišenský; Petr Pouč; Ondřej Ondryáš; Adam Horák; Petr Pouč; Ondřej Ondryáš
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS certificate fields, and GeoIP information for 432,572 verified benign domains from Cisco Umbrella and 36,993 verified phishing domains from PhishTank and OpenPhish services. The dataset is useful for statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing detection. The data was collected between March and July 2023.The final assessment of the data was conducted in July 2023 (this is why the names are suffixed with _2307).

    The upload contains: a) data files, b) the description of the data structure, and c) the veature vector we used for ML-based phishing domain detection.

    Data Files

    The data is located in two individual files:

    • benign_2307.json - data about 432,572 benign domains, and
    • phishing_2307.json - data about 36,993 phishing domains.

    Data Structure

    Both files are in the JSON Array format. The structure is as follows:

    [
    {
      "_id" : "A unique ID of the data record",
      "domain_name" : "Name of the domain (e.g., zenodo.com)",
      "dns" : { "//": "Data obtained from DNS records" },
      "evaluated_on" : "// ISO Timestamp of data collection ",
      "ip_data" : [  "// Data for each related IP adddress ",
        {
          "//": "IP-related data, including RTT from ICMP echo attempts (from Brno, Czechia)",
          "//": "WHOIS/RDAP data for the given IP address",
          "//": "GeoIP data for the given IP address",
          "//": "NERD system reputation score (if available)",
          "//": "ASN info",
          "//": "remarks: ISO timestamps of collection of the individual data pieces"
        },
      ],
      "label" : "benign_2307 for benign OR misp_2307 for phishing",
      "rdap" : { "//": "WHOIS/RDAP information for the domain name" },
      "remarks" : {
        "dns_evaluated_on" : "ISO Timestamp of DNS data collection",
        "rdap_evaluated_on" : "ISO Timestamp of WHOIS/RDAP data collection",
        "tls_evaluated_on" : "ISO Timestamp of TLS certificate information collection",
        "dns_had_no_ips" : "true if no IPs were found in DNS records"
      },
      "sourced_on" : "ISO Timestamp of the moment the domain was found",
      "tls" : {
        "cipher" : "Identifier of the TLS cipher suite",
        "count" : "Number of certificates in chain",
        "protocol" : "Version of the TLS protocol",
        "certificates" : [
          "//": "Information from TLS certificate fields: issuer, extensions, etc."
        ]
      },
      "category" : "Category of the record (could be ignored)",
      "source" : "Name of the file that we used to save the domain list"
    }
    ]

    Feature Vector

    This section describes the veature vector used in the "Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence" paper that was accepted to the IEEE NOMS 2024 conference.

    Lexical Features

    The following features were extracted from the sole domain name:

    • lex_name_len - length of the domain name,
    • lex_begins_with_digit - true if the domain name begins with a digit,
    • lex_www_flag - true if the domain name begins with "www.",
    • lex_phishing_keyword_count - occurence count of 47 phishing-related keywords,
    • lex_consecutive_chars - length of the longest consecutive character sequence,
    • lex_tld_len - length of the top-level domain (TLD),
    • lex_tld_hash - hash of the TLD,
    • lex_sld_len - length of the second-level domain (SLD),
    • lex_sld_norm_entropy - normalized entropy of the SLD,
    • lex_stld_unique_char_count - number of unique characters in the TLD and the SLD,
    • lex_sub_count - number of subdomains,
    • lex_sub_digit_ratio - ratio of digits in subdomains,
    • lex_sub_hex_ratio - ratio of hex symbols in subdomains,
    • lex_sub_non_alpanum_ratio - ratio of non-alphanumeric symbols in subdomains,
    • lex_sub_vowel_ratio - ratio of vowels in subdomains,
    • lex_sub_consonant_ratio - ratio of consonants in subdomains,
    • lex_sub_max_consonant_len - length of the longest consonant sequence in subdomains,
    • lex_sub_norm_entropy - normalized entropy of a string made from all subdomains,
    • lex_phishing_bigram_matches - occurrence count of the top 300 phishing domain bigrams,
    • lex_phishing_trigram_matches - occurrence count of the top 2000 phishing domain trigrams,
    • lex_phishing_tetragram_matches - occurrence count of the top 5000 phishing domain tetragrams,
    • lex_phishing_pentagram_matches - occurrence count of the top 10000 phishing domain pentagrams.

    DNS-based Features

    The following features were extracted from DNS responses when querying about the domain:

    • dns_A_count - number of A records for the domain,
    • dns_AAAA_count - number of AAAA records for the domain,
    • dns_CNAME_count - number of CNAME records for the domain,
    • dns_MX_count - number of MX records for the domain,
    • dns_NS_count - number of nameserver (NS) records for the domain,
    • dns_TXT_count - number of TXT records for the domain,
    • dns_soa_primary_ns_len - number of characters in the primary NS's domain name,
    • dns_soa_primary_ns_level - number of subdomain in the primary NS's domain name,
    • dns_soa_primary_ns_digit_count - number of digits in the primary NS's domain name,
    • dns_soa_primary_ns_entropy - normalized entropy of the primary NS's domain name,
    • dns_soa_email_len - number of characters in the admin's email domain name part,
    • dns_soa_email_level - number of subdomains in the admin's email domain name part,
    • dns_soa_email_digit_count - number of digits in the admin's email domain name part,
    • dns_soa_email_entropy - normalized entropy of the admin's email domain name part,
    • dns_soa_refresh - SOA refresh parameter,
    • dns_soa_retry - SOA retry parameter,
    • dns_soa_expire - SOA expire parameter,
    • dns_mx_avg_len - average number of characters of the domain names in MX records,
    • dns_mx_avg_entropy - average normalized entropy of the domain names in MX records,
    • dns_domain_name_in_mx - true if the domain name is contained in the MX record's domains,
    • dns_txt_spf_exists - true if an SPF record is in the TXT RRs,
    • dns_txt_avg_entropy - average normalized entropy of the TXT records
    • dns_ttl_low - number of RRsets with TTL in [0,100],
    • dns_ttl_mid - number of RRsets with TTL in [101,500],
    • dns_zone_entropy - normalized entropy of the zone's domain name.

    IP-based Features

    These features were derived from IP addresses and ICMP echo replies:

    • ip_mean_average_rtt - average RTT of all ICMP echo attempts,
    • ip_entropy - total entropy of all /16 (/64 for v6) IP prefixes,
    • ip_count - total number of IP addresses for the domain,
    • ip_v4_count - total number of IPv4 addresses for the domain,
    • ip_v6_count - total number of IPv6 addresses for the domain,

    TLS-based Features

    The following features were extracted from TLS certificate chains and TLS handshakes:

    • tls_chain_len - length of the TLS certificate chain,
    • tls_broken_chain - true if there is a certificate that has never been valid,
    • tls_expired_chain - true if there is an expired certificate in the chain,
    • tls_total_extension_count - total extensions in all certificates in the chain,
    • tls_critical_extensions - total extensions flagged as "critical" in all certificates,
    • tls_with_policies_crt_count - number of certificates that include the "policies" extension,
    • tls_percentage_crt_with_policies - percentage of certificates that include the "policies" extension,
    • tls_x509_anypolicy_crt_count - number of certificates not enforcing any security policy,
    • tls_iso_policy_crt_count - total discovered policies from the 1.* OID space,
    • tls_joint_isoitu_policy_crt_count - total discovered policies from from the 2.* OID space,
    • tls_subject_count - number of subject alternative names (SANs) in the leaf certificate,
    • tls_server_auth_crt_count - number of certificates with the "Web Server Authentication",
    • tls_client_auth_crt_count - number of certificates with the "Web Client Authentication",
    • tls_CA_certs_in_chain_ratio - ratio of CA certificates in the chain,
    • tls_unique_SLD_count -number of unique second-level domains

  17. Monthly number of phishing cases Indonesia 2023

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Monthly number of phishing cases Indonesia 2023 [Dataset]. https://www.statista.com/statistics/1411333/indonesia-monthly-number-of-phishing-attacks/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2023 - Dec 2023
    Area covered
    Indonesia
    Description

    As of December 2023, Indonesia recorded around ***** of phishing attacks, an increase compared to the previous month. From the first quarter to the fourth quarter of 2023, the highest number of phishing attacks happened in February, amounted to around more than ** thousand cases.

  18. p

    Google Ads Click Fraud Statistics 2023

    • pablodiazt.com
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Diaz (2023). Google Ads Click Fraud Statistics 2023 [Dataset]. https://pablodiazt.com/google-ads-statistics/google-ads-click-fraud-statistics-2023
    Explore at:
    Dataset updated
    Jan 1, 2023
    Authors
    Pablo Diaz
    Description

    Ad fraud reached an all-time high as AI-powered bots became increasingly sophisticated.

  19. f

    Fraud Court Statistics — Liberty County, Florida

    • floridacourtfile.com
    Updated Mar 9, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2026). Fraud Court Statistics — Liberty County, Florida [Dataset]. https://www.floridacourtfile.com/charges/fraud/liberty
    Explore at:
    Dataset updated
    Mar 9, 2026
    Time period covered
    2023 - 2025
    Area covered
    Liberty County, Florida
    Variables measured
    guilty_rate, total_cases, withheld_rate, dismissal_rate, avg_sentence_days
    Description

    Criminal court outcome data for Fraud cases in Liberty County, FL (2023-2025)

  20. nibsss-fraud-dataset

    • kaggle.com
    zip
    Updated Feb 11, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Endurance, the Martian 👽 (2026). nibsss-fraud-dataset [Dataset]. https://www.kaggle.com/datasets/hendurhance/nibsss-fraud-dataset
    Explore at:
    zip(100777287 bytes)Available download formats
    Dataset updated
    Feb 11, 2026
    Authors
    Endurance, the Martian 👽
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NIBSS Fraud Dataset: Nigerian Banking Transactions for ML Research

    🚨 The First Comprehensive Nigerian Financial Fraud Detection Dataset

    Welcome to the most extensive publicly available dataset for fraud detection research in the Nigerian banking sector. This meticulously crafted synthetic dataset contains 1,000,000 financial transactions specifically calibrated to reflect real Nigerian banking patterns using official NIBSS (Nigerian Interbank Settlement System) 2023 fraud landscape statistics.

    🎯 Why This Dataset Matters

    Nigeria processes over ₦600 trillion in electronic payments annually (55% growth in 2023), yet sophisticated fraud detection mechanisms remain underdeveloped. This dataset bridges that critical gap by providing researchers, data scientists, and financial institutions with:

    • Realistic fraud patterns based on actual NIBSS reported statistics
    • Comprehensive feature engineering with 38 carefully designed variables
    • Ethical synthetic data ensuring privacy compliance while maintaining statistical validity
    • Production-ready framework for immediate implementation in Nigerian banking contexts

    📊 Dataset Specifications

    Core Statistics

    • Total Transactions: 1,000,000 synthetic records
    • Fraud Rate: 0.30% (3,000 fraud cases) - calibrated to NIBSS 2023 data
    • Features: 38 engineered variables across temporal, behavioral, and risk categories
    • Data Quality: 100% complete records, no missing values
    • Format: CSV with comprehensive metadata

    Real-World Calibration (NIBSS 2023 Ground Truth)

    MetricDataset ValueNIBSS 2023 Source
    Overall Fraud Rate0.30%NIBSS Annual Report
    Mobile Banking Fraud49.75% of casesChannel Risk Analysis
    Peak Fraud MonthMay (12.25%)Temporal Distribution
    Average Fraud Loss₦384,959Economic Impact Study
    Social Engineering65.8% of techniquesFraud Methodology Report

    🔬 Feature Engineering Excellence

    Temporal Features (12 variables)

    • Cyclic Encodings: Hour, day, month, quarter transformations
    • Rolling Windows: 24h, 7d, 30d transaction aggregations
    • Velocity Metrics: Transaction frequency and acceleration patterns
    • Business Hours: Nigerian banking hour indicators

    Behavioral Analytics (15 variables)

    • Spending Patterns: Amount deviation ratios and statistical z-scores
    • Transaction Velocity: Acceleration and deceleration indicators
    • Channel Behavior: Multi-channel usage patterns and switching analysis
    • Location Consistency: Geographic stability metrics
    • Historical Context: Customer transaction history patterns

    Risk Assessment (11 variables)

    • Composite Risk Scores: Multi-factor risk aggregation
    • Merchant Categories: Industry-specific risk classifications
    • Channel Risk: Platform-specific fraud indicators
    • Anomaly Detection: Statistical outlier identification
    • Cross-Channel Correlation: Inter-platform risk assessment

    🏆 Proven Research Results

    This dataset has been extensively validated through academic research with statistically significant results:

    Model Performance (Bootstrap Validated, n=100)

    AlgorithmAUC-ROCPrecisionRecallF1-ScoreEconomic Value
    XGBoost0.9731.0000.7460.85443.7% cost reduction
    Random Forest0.9771.0000.5380.69969.1% cost reduction
    Logistic Regression0.7990.0070.6990.0151.9% cost reduction

    Feature Importance Insights

    • Amount vs Mean Ratio: Primary fraud indicator across all models
    • 24-Hour Transaction Aggregations: Critical behavioral patterns
    • Velocity Metrics: Key anomaly detection features
    • Channel-Specific Indicators: Platform risk differentiation

    🎓 Academic Rigor

    This dataset is the cornerstone of a comprehensive BSc Statistics dissertation (University of Lagos, 2024/2025) that includes:

    • Statistical Validation: Bootstrap confidence intervals with 95% significance
    • Cross-Validation: 5-fold stratified validation maintaining fraud distribution
    • Economic Analysis: Cost-benefit optimization using Nigerian banking cost structures
    • Interpretability: SHAP analysis for model transparency
    • Reproducibility: Fixed random seeds and comprehensive documentation

    💼 Real-World Applications

    Immediate Use Cases

    • Academic Research: Fraud detection algorithm development and comparison
    • Banking Implementation: Production-ready fraud scoring systems
    • Fintech Development: Risk assessment for Nigerian payment platforms
    • Regulatory Compliance: NIBSS-aligned fraud prevention frameworks
    • **Educational ...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2026). Global number of e-mail phishing attacks 2022-2023 [Dataset]. https://www.statista.com/statistics/1493550/phishing-attacks-global-number/
Organization logo

Global number of e-mail phishing attacks 2022-2023

Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2022 - Dec 2023
Area covered
Worldwide
Description

In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.

Search
Clear search
Close search
Google apps
Main menu