Facebook
TwitterIn December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.
Facebook
TwitterSurveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
Facebook
TwitterIn 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was ***** percent. In the examined year, Peru was the second region, with an attack rate of nearly ** percent, while Taiwan followed with ***** percent.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.
Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.
CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.
The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:
This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.
Please cite the following two articles if you are using this dataset:
Facebook
TwitterA 2022 survey of working adults and IT professionals worldwide revealed that bulk phishing attacks were the most common cyber incidents experienced by ** percent of organizations in 2022. Spear phishing ranked second, with ***** in **** respondents stating having encountered such incidents during the same year. Overall, between 2021 and 2022, there has been a decrease in the most common types of attacks.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed while constructing the dataset are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.
Class Labels Label 1 corresponds to a legitimate URL, label 0 to a phishing URL
Citations: Prasad, A., & Chandra, S. (2023). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 103545. doi: https://doi.org/10.1016/j.cose.2023.103545
Facebook
TwitterIn the second quarter of 2023, smartphone users in North America encountered around ***** thousand phishing and malicious attempts worldwide, making it the region with the highest number of such incidents. Europe ranked second, with more than *** thousand phishing and malicious attempts.
Facebook
Twitterhttps://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Fraud Detection and Prevention Statistics: Fraud Detection and Prevention (FDP) is a critical focus for organizations. Involving strategies and technologies to thwart deceptive activities aimed at financial gain.
Fraud encompasses various forms, from identity theft to cybercrimes, and effective FDP is crucial for minimizing financial losses. Preserving reputation, adhering to legal obligations, and safeguarding data.
Key components include data collection, statistical analysis, machine learning, fraud prevention measures, regulatory compliance, and integrating emerging technologies like AI and blockchain. Continuous improvement is vital in this dynamic field to stay ahead of evolving fraud tactics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.
For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.
In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.
The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.
Facebook
TwitterIn the 4th quarter of 2024, over 989,000 unique phishing attacks were detected worldwide, representing a slight increase from the preceding quarter. By far, the number of unique phishing sites has seen the most significant jump between the second and the third quarters of 2020, from nearly 147,000 to approximately 572,000. This figure is based on the number of the unique base URLs of the phishing sites.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.
In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.
Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.
To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.
To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki
To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals
The following heuristics are used to annotate labels for Enron email data using the other two data sources,
Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.
Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.
The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.
If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.
Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,
Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.
Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.
Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.
Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.
Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.
To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.
| Fraud | Non-Fraud |
|---|---|
| 2327 | 445090 |
Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015
Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023
CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset "Cybersecurity Cases in India" is a comprehensive collection of real-world cybersecurity incidents reported across various cities in India. The dataset encapsulates the financial loss, incident types, and categories, providing a detailed overview of the cybercrime landscape in one of the world’s largest digital economies. With over 1000 records, it spans incidents from 2020 to 2024, covering various types of cybercrimes such as phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, identity theft, and more. Each record captures important attributes of the incidents, such as the year, date of occurrence, amount lost in INR, the type of incident, the city in which it occurred, and the category of the affected entity (e.g., financial, personal, corporate).
The dataset is structured to enable analysis of the trends in cybercrime over time, the financial impact of various cyberattacks, and the geographic distribution of incidents across Indian cities. It serves as a critical resource for cybersecurity professionals, policymakers, law enforcement agencies, and academic researchers seeking to understand the challenges posed by cybercrime in India and to identify strategies to combat these challenges.
The dataset’s primary purpose is to provide an extensive, granular view of the nature and scope of cybersecurity incidents in India. It enables the analysis of the frequency, severity, and financial impact of cybercrimes across different types of attacks, cities, and time periods. As cybercrimes continue to rise globally, including in India, this dataset serves as an important tool for understanding the evolving threats and risks in cyberspace. Cybersecurity experts and analysts can leverage this dataset to identify patterns and trends, while government and law enforcement agencies can use it to devise more targeted interventions and preventive measures.
India, with its large and growing digital footprint, is a prime target for cybercriminals. The country's rapidly expanding internet user base, coupled with increasing digital adoption in various sectors like finance, healthcare, education, and e-commerce, makes it an attractive target for cyberattacks. This dataset allows stakeholders to understand how cybercrime evolves in response to these dynamics.
The dataset is a rich resource for understanding the following:
The dataset includes the following key variables, each contributing valuable information to the analysis:
India's digital transformation has made it a prime target for cybercriminals. As of 2023, India is one of the largest internet markets in the world, with over 600 million active internet users. The rapid growth of e-commerce, digital banking, social media, and government services has created new opportunities for cybercriminals to exploit vulnerabilities in digital systems. According to a 2022 report by the Indian Computer Emergency Response Team (CERT-In), India witnessed a significant increase in cybersecurity incidents, with millions of cyberattacks targeting individuals, b...
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global spear phishing protection market size was valued at approximately USD 1.4 billion in 2023 and is anticipated to reach around USD 3.8 billion by 2032, growing at a compound annual growth rate (CAGR) of approximately 11.5% during the forecast period. This robust growth trajectory can be largely attributed to the increasing sophistication and frequency of spear phishing attacks, which are driving organizations across various sectors to invest in advanced protection solutions. As businesses become more digital, the attack surfaces expand, making them more vulnerable to such targeted cyber threats. Furthermore, the evolution of cyber threats, alongside stringent regulatory requirements for data protection, is pushing enterprises to adopt comprehensive spear phishing protection strategies.
The growth of the spear phishing protection market is significantly driven by a heightened awareness of cybersecurity threats and the need for robust defense mechanisms. Spear phishing attacks, characterized by their targeted and personalized nature, have been growing in complexity, leading to severe financial losses and data breaches. Organizations are increasingly recognizing that traditional security measures are insufficient against these sophisticated attacks. As a result, there is a substantial uptick in the demand for specialized tools and services that offer real-time threat intelligence, advanced email filtering, and behavior analytics to preemptively identify and thwart potential phishing attempts. Moreover, the integration of artificial intelligence (AI) and machine learning (ML) in cybersecurity solutions is transforming the spear phishing protection landscape, allowing for more adaptive and predictive threat identification.
Legislative requirements and compliance standards are also pivotal in propelling the market forward. Various governments and regulatory bodies have implemented stringent data protection laws that mandate organizations to have robust cybersecurity measures in place. Non-compliance can result in hefty fines and reputational damage, further emphasizing the need for effective spear phishing protection solutions. This regulatory environment is prompting organizations, especially those in highly regulated industries like banking and finance, healthcare, and government sectors, to prioritize their cybersecurity initiatives. Consequently, there is an increasing allocation of budgets towards enhancing cybersecurity infrastructure, with a significant portion dedicated to spear phishing protection.
The proliferation of remote work and digital transformation initiatives has further catalyzed the growth of the spear phishing protection market. With the widespread adoption of remote working models, employees are accessing corporate networks from various locations and devices, thereby expanding the attack surface for cybercriminals. This shift necessitates a reevaluation of existing cybersecurity strategies to include comprehensive measures against spear phishing attacks. Organizations are investing in cloud-based security solutions that offer scalability, flexibility, and real-time monitoring capabilities. Moreover, as digital transformation continues to accelerate across industries, there is an increasing reliance on digital communication tools, which are often targeted by spear phishing attacks, thus highlighting the critical need for effective protection measures.
Regionally, North America holds a substantial share of the spear phishing protection market, driven by the presence of major cybersecurity companies, a high rate of technology adoption, and stringent regulatory standards. The region's advanced IT infrastructure and significant investments in cybersecurity are key factors contributing to its dominance. Europe follows closely, with strong emphasis on data protection due to regulations like the General Data Protection Regulation (GDPR), which mandates robust cybersecurity frameworks. The Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, fueled by the rapid digitalization of economies, increasing cyber threats, and rising awareness about cybersecurity. Countries like China, India, and Japan are investing heavily in cybersecurity measures to protect their growing digital economies.
The spear phishing protection market is segmented by component into solutions and services. Solutions encompass various technologies and software designed to detect, prevent, and mitigate spear phishing attacks. These solutions include advanced email security, endpoint protection, network security, thre
Facebook
TwitterIn 2023, web services were targeted most by phishing attacks. They accounted for over ** percent of financial phishing attacks worldwide. Delivery company ranked second with nearly ** percent, while Global internet portals followed with ***** percent of the phishing attacks in the examined year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Defence SA Fraud 2023-24
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS certificate fields, and GeoIP information for 432,572 verified benign domains from Cisco Umbrella and 36,993 verified phishing domains from PhishTank and OpenPhish services. The dataset is useful for statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing detection. The data was collected between March and July 2023.The final assessment of the data was conducted in July 2023 (this is why the names are suffixed with _2307).
The upload contains: a) data files, b) the description of the data structure, and c) the veature vector we used for ML-based phishing domain detection.
The data is located in two individual files:
Both files are in the JSON Array format. The structure is as follows:
[
{
"_id" : "A unique ID of the data record",
"domain_name" : "Name of the domain (e.g., zenodo.com)",
"dns" : { "//": "Data obtained from DNS records" },
"evaluated_on" : "// ISO Timestamp of data collection ",
"ip_data" : [ "// Data for each related IP adddress ",
{
"//": "IP-related data, including RTT from ICMP echo attempts (from Brno, Czechia)",
"//": "WHOIS/RDAP data for the given IP address",
"//": "GeoIP data for the given IP address",
"//": "NERD system reputation score (if available)",
"//": "ASN info",
"//": "remarks: ISO timestamps of collection of the individual data pieces"
},
],
"label" : "benign_2307 for benign OR misp_2307 for phishing",
"rdap" : { "//": "WHOIS/RDAP information for the domain name" },
"remarks" : {
"dns_evaluated_on" : "ISO Timestamp of DNS data collection",
"rdap_evaluated_on" : "ISO Timestamp of WHOIS/RDAP data collection",
"tls_evaluated_on" : "ISO Timestamp of TLS certificate information collection",
"dns_had_no_ips" : "true if no IPs were found in DNS records"
},
"sourced_on" : "ISO Timestamp of the moment the domain was found",
"tls" : {
"cipher" : "Identifier of the TLS cipher suite",
"count" : "Number of certificates in chain",
"protocol" : "Version of the TLS protocol",
"certificates" : [
"//": "Information from TLS certificate fields: issuer, extensions, etc."
]
},
"category" : "Category of the record (could be ignored)",
"source" : "Name of the file that we used to save the domain list"
}
]
This section describes the veature vector used in the "Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence" paper that was accepted to the IEEE NOMS 2024 conference.
The following features were extracted from the sole domain name:
The following features were extracted from DNS responses when querying about the domain:
These features were derived from IP addresses and ICMP echo replies:
The following features were extracted from TLS certificate chains and TLS handshakes:
Facebook
TwitterAs of December 2023, Indonesia recorded around ***** of phishing attacks, an increase compared to the previous month. From the first quarter to the fourth quarter of 2023, the highest number of phishing attacks happened in February, amounted to around more than ** thousand cases.
Facebook
TwitterAd fraud reached an all-time high as AI-powered bots became increasingly sophisticated.
Facebook
TwitterCriminal court outcome data for Fraud cases in Liberty County, FL (2023-2025)
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Welcome to the most extensive publicly available dataset for fraud detection research in the Nigerian banking sector. This meticulously crafted synthetic dataset contains 1,000,000 financial transactions specifically calibrated to reflect real Nigerian banking patterns using official NIBSS (Nigerian Interbank Settlement System) 2023 fraud landscape statistics.
Nigeria processes over ₦600 trillion in electronic payments annually (55% growth in 2023), yet sophisticated fraud detection mechanisms remain underdeveloped. This dataset bridges that critical gap by providing researchers, data scientists, and financial institutions with:
| Metric | Dataset Value | NIBSS 2023 Source |
|---|---|---|
| Overall Fraud Rate | 0.30% | NIBSS Annual Report |
| Mobile Banking Fraud | 49.75% of cases | Channel Risk Analysis |
| Peak Fraud Month | May (12.25%) | Temporal Distribution |
| Average Fraud Loss | ₦384,959 | Economic Impact Study |
| Social Engineering | 65.8% of techniques | Fraud Methodology Report |
This dataset has been extensively validated through academic research with statistically significant results:
| Algorithm | AUC-ROC | Precision | Recall | F1-Score | Economic Value |
|---|---|---|---|---|---|
| XGBoost | 0.973 | 1.000 | 0.746 | 0.854 | 43.7% cost reduction |
| Random Forest | 0.977 | 1.000 | 0.538 | 0.699 | 69.1% cost reduction |
| Logistic Regression | 0.799 | 0.007 | 0.699 | 0.015 | 1.9% cost reduction |
This dataset is the cornerstone of a comprehensive BSc Statistics dissertation (University of Lagos, 2024/2025) that includes:
Facebook
TwitterIn December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.