100+ datasets found

Most targeted industry sectors worldwide targeted by phishing Q4 2024
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.
S
Phishing Email Statistics 2026: The Growing Threat and How to Protect Your...
sqmagazine.co.uk
Updated Oct 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SQ Magazine (2025). Phishing Email Statistics 2026: The Growing Threat and How to Protect Your Organization [Dataset]. https://sqmagazine.co.uk/phishing-email-statistics/
Explore at:
Dataset updated
Oct 3, 2025
Dataset authored and provided by
SQ Magazine
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Jan 1, 2024 - Dec 31, 2026
Area covered
Earth, Worldwide
Description
Explore key phishing email stats, including attack frequency, success rates, target industries, user vulnerability, and cybersecurity impact!
Number of global phishing attacks Q3 2013- Q4 2024
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Number of global phishing attacks Q3 2013- Q4 2024 [Dataset]. https://www.statista.com/statistics/266155/number-of-phishing-attacks-worldwide/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the 4th quarter of 2024, over 989,000 unique phishing attacks were detected worldwide, representing a slight increase from the preceding quarter. By far, the number of unique phishing sites has seen the most significant jump between the second and the third quarters of 2020, from nearly 147,000 to approximately 572,000. This figure is based on the number of the unique base URLs of the phishing sites.
s
Phishing: distribution of attacks 2023, by region
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Phishing: distribution of attacks 2023, by region [Dataset]. https://www.statista.com/statistics/266362/phishing-attacks-country/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statista
Time period covered
2023
Area covered
Worldwide
Description
In 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was ***** percent. In the examined year, Peru was the second region, with an attack rate of nearly ** percent, while Taiwan followed with ***** percent.
🕵️ Phishing Websites Data
kaggle.com
zip
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sairaj Adhav (2025). 🕵️ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
Explore at:
zip(89566 bytes)Available download formats
Dataset updated
Feb 24, 2025
Authors
Sairaj Adhav
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Phishing Websites Dataset

Overview

This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

Dataset Information

Total Columns: 31 (30 Features + 1 Target)

Target Variable: Result (Indicates whether a website is phishing or legitimate)

Features Description

URL-Based Features

Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.

double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.

having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.

Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).

URL_Length – Measures the length of the URL; phishing URLs tend to be longer.

having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

Domain-Based Features

having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.

SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).

Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.

age_of_domain – The age of the domain in days; older domains are usually more trustworthy.

DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

Webpage-Based Features

Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).

port – Identifies if the site is using suspicious or non-standard ports.

HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.

Request_URL – Measures the percentage of external resources loaded from different domains.

URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.

Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.

SFH (Server Form Handler) – Determines if form actions are handled suspiciously.

Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.

Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.

Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

Behavior-Based Features

on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).

RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).

popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.

Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

Traffic & Search Engine Features

web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.

Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.

Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).

Links_pointing_to_page – Counts the number of backlinks pointing to the website.

Statistical_report – Uses external sources to verify if the website has been reported for phishing.

Target Variable

Result – The classification label (1: Legitimate, -1: Phishing)

Usage

This dataset is valuable for:
✅ Machine Learning Models – Developing classifiers for phishing detection.
✅ Cybersecurity Research – Understanding patterns in phishing attacks.
✅ Browser Security Extensions – Enhancing anti-phishing tools.
Phishing Website Dataset
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra (2023). Phishing Website Dataset [Dataset]. http://doi.org/10.5281/zenodo.8041387
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8041387
Dataset updated
Jul 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.

For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.

In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.

The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.
m
StealthPhisher Phishing Attack Dataset
data.mendeley.com
kaggle.com
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanmay Jha (2025). StealthPhisher Phishing Attack Dataset [Dataset]. http://doi.org/10.17632/m2479kmybx.2
Explore at:
Unique identifier
https://doi.org/10.17632/m2479kmybx.2
Dataset updated
Nov 7, 2025
Authors
Tanmay Jha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The StealthPhisher Phishing Attack Dataset, generated at the Cybersecurity Lab, GLA University, Mathura, is a large, diverse, and recent Phishing Attack Dataset developed to address the evolving nature of phishing attacks. It comprises over 336,749 records, including 160,943 legitimate URLs and 175,806 phishing URLs, collected from reliable sources such as PhishTank. Reflecting the most recent phishing tactics, this dataset serves as a valuable resource for training and evaluating AI-based phishing detection systems.

Key features include URL-based attributes (e.g., length, TLD type, IP presence), statistical metrics (e.g., Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based features (e.g., popups, redirects, forms). These multidimensional attributes provide comprehensive insights into phishing behavior, enabling accurate and robust threat detection. Designed to capture real-world scenarios, the dataset equips AI models to recognize both traditional and emerging phishing strategies effectively.

This dataset was generated as part of the research work presented in the article “StealthPhisher: A Defensive Framework against Phishing Attack using Hybrid Deep Learning and GenAI,” published in Expert Systems with Applications (https://doi.org/10.1016/j.eswa.2025.130205). Researchers using this dataset in their research work are kindly requested to cite this article.
Outcomes of successful phishing attacks in companies worldwide 2021-2023
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/statistics/1350723/consequences-phishing-attacks/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
m
Phishing URL dataset
data.mendeley.com
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JISHNU K S KAITHOLIKKAL (2024). Phishing URL dataset [Dataset]. http://doi.org/10.17632/vfszbj9b36.1
Explore at:
Unique identifier
https://doi.org/10.17632/vfszbj9b36.1
Dataset updated
Apr 2, 2024
Authors
JISHNU K S KAITHOLIKKAL
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phishing URL dataset exclusively contains 54,807 URLs identified as phishing, providing a focused resource for studying and combating malicious online activities. Meanwhile, the URL dataset comprises 450,176 URLs sourced from various platforms, including PhisTank, the Majestic Million, and other pertinent sources. Each URL in the dataset is meticulously categorized as either "phishing" or "legitimate." Among these URLs, 104,438 have been flagged as phishing URLs, indicating malicious intent, while the remaining 345,738 URLs are classified as legitimate, denoting non-malicious or benign activity. This extensive dataset, drawn from multiple reputable sources, serves as a crucial asset for cybersecurity researchers and practitioners, facilitating the development and validation of advanced techniques for effectively detecting and mitigating phishing attacks.
Phishing Email & SMS Dataset with NLP Categories
kaggle.com
zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad Tijjani (2025). Phishing Email & SMS Dataset with NLP Categories [Dataset]. https://www.kaggle.com/datasets/ahmadtijjani/phishing-urgency-authority-persuasion
Explore at:
zip(5103 bytes)Available download formats
Dataset updated
Mar 25, 2025
Authors
Ahmad Tijjani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Phishing Message Dataset (1000 Samples)

This dataset comprises 1,000 phishing messages, categorized based on NLP-based deception techniques commonly used in social engineering attacks.

Phishing Categories:

Urgency – Messages that create a sense of immediate action.
Authority – Messages impersonating trusted figures or organizations.
Persuasion – Messages using manipulative language to convince the recipient.

Dataset Structure:

Each record contains the following fields:
- text – The phishing message (email or SMS).
- category – The type of phishing attack (urgency, authority, persuasion).
- label – A classification label ("phishing") for machine learning tasks.

Potential Applications:

Natural Language Processing (NLP) – Analyze linguistic patterns in phishing messages.
Cybersecurity Research – Identify deceptive techniques used in phishing attacks.
Phishing Detection Models – Train AI models to classify and detect phishing messages.
AI-driven Threat Analysis – Improve automated cybersecurity threat detection.

This dataset serves as a valuable resource for developing AI-powered solutions in cybersecurity and NLP-based phishing detection.
m
Zieni dataset for Phishing detection
data.mendeley.com
kaggle.com
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rasha Zieni (2024). Zieni dataset for Phishing detection [Dataset]. http://doi.org/10.17632/8mcz8jsgnb.1
Explore at:
Unique identifier
https://doi.org/10.17632/8mcz8jsgnb.1
Dataset updated
Sep 4, 2024
Authors
Rasha Zieni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was used for training machine learning models to detect phishing attacks and for studying the explainability of these models. It was published in 2024. The dataset refers to phishing and legitimate websites. Phishing samples have been collected from two sources, namely, PhishTank and Tranco, whereas legitimate samples were collected from Alexa. The dataset is balanced and contains 5,000 phishing and 5,000 legitimate samples, each described by 74 features extracted from the entire URL as well as from the Fully Qualified Domain Name, pathname, filename, and parameters. Of these features, 70 are numerical and four binary. The target variable is also binary.
Malicious URL Detection Dataset (Enhanced 2026)
kaggle.com
zip
Updated Feb 17, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
moutasm tamimi (2026). Malicious URL Detection Dataset (Enhanced 2026) [Dataset]. https://www.kaggle.com/datasets/moutasmtamimi/malicious-url-detection-dataset-enhanced-2026
Explore at:
zip(94275271 bytes)Available download formats
Dataset updated
Feb 17, 2026
Authors
moutasm tamimi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Overview This dataset is a large, feature-rich collection of URLs designed for research and development in malicious URL detection. It contains a total of 651,000 samples, each enriched with detailed lexical, structural, statistical, and phishing-related attributes. The goal of the dataset is to support the development of machine learning models capable of identifying harmful URLs before they lead to security incidents such as phishing attacks, data theft, and malware infections. By combining raw URL properties with engineered features, the dataset offers a comprehensive foundation for both traditional and advanced cybersecurity classification models.

The dataset consists of 65 features, including URL length, character frequencies, entropy measures, domain and subdomain statistics, and multiple phishing-specific indicators. These features capture a wide range of behavioral and structural patterns commonly found in malicious URLs. The final column, label, assigns each entry to one of four categories: benign, defacement, phishing, or malware. This multi-class structure allows the dataset to be used not only for malicious vs. benign classification but also for more detailed threat type identification.

The class distribution contains 428,103 benign URLs, 96,457 defacement URLs, 93,920 phishing URLs, and 32,520 malware URLs. While the dataset is naturally imbalanced, it remains representative of real-world cyber environments where benign traffic far exceeds malicious activity. This realistic distribution makes the dataset valuable for evaluating model robustness and handling class imbalance through techniques such as sampling or weighted training. Overall, the dataset provides a solid and versatile benchmark for cybersecurity machine learning tasks.

65 features The final dataset contains 65 features, including both raw URL characteristics and a wide range of engineered attributes. These features cover lexical patterns, special-character counts, entropy measures, subdomain and path statistics, phishing-specific indicators, and various statistical ratios. Together, they provide a comprehensive representation of each URL, making the dataset suitable for building strong and reliable machine learning models for malicious URL detection.
Phishing attacks – who is most at risk?
gov.uk
s3.amazonaws.com
Updated Sep 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2022). Phishing attacks – who is most at risk? [Dataset]. https://www.gov.uk/government/statistics/phishing-attacks-who-is-most-at-risk
Explore at:
Dataset updated
Sep 26, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Office for National Statistics
Description
Official statistics are produced impartially and free from political influence.
QR code phishing attacks global 2025
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). QR code phishing attacks global 2025 [Dataset]. https://www.statista.com/statistics/1622377/global-qr-phishing-attacks/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2024 - Feb 2025
Area covered
Worldwide
Description
Between February 2024 and February 2025, nearly 21 percent of employees at global organizations stated they experienced a QR code phishing attack. Additionally, over 21 percent of customers of managed service providers (MSPs) stated encountering such attacks.
Phishing Website Detection Dataset
kaggle.com
zip
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Hasibur Rahman (2024). Phishing Website Detection Dataset [Dataset]. https://www.kaggle.com/datasets/hasibur013/url-data-for-phishing-website-detection
Explore at:
zip(83149 bytes)Available download formats
Dataset updated
Aug 25, 2024
Authors
Md. Hasibur Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The "Phishing Data" dataset is a comprehensive collection of information specifically curated for analyzing and understanding phishing attacks. Phishing attacks involve malicious attempts to deceive individuals or organizations into disclosing sensitive information such as passwords or credit card details. This dataset comprises 18 distinct features that offer valuable insights into the characteristics of phishing attempts. These features include the URL of the website being analyzed, the length of the URL, the use of URL shortening services, the presence of the "@" symbol, the presence of redirection using "//", the presence of prefixes or suffixes in the URL, the number of subdomains, the usage of secure connection protocols (HTTPS), the length of time since domain registration, the presence of a favicon, the presence of HTTP or HTTPS tokens in the domain name, the URL of requested external resources, the presence of anchors in the URL, the number of hyperlinks in HTML tags, the server form handler used, the submission of data to email addresses, abnormal URL patterns, and estimated website traffic or popularity. Together, these features enable the analysis and detection of phishing attempts in the "Phishing Data" dataset, aiding in the development of models and algorithms to combat phishing attacks.
m
Phishing Websites Dataset
data.mendeley.com
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhash Ariyadasa (2021). Phishing Websites Dataset [Dataset]. http://doi.org/10.17632/n96ncsr5g4.1
Explore at:
Unique identifier
https://doi.org/10.17632/n96ncsr5g4.1
Dataset updated
Nov 16, 2021
Authors
Subhash Ariyadasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of a collection of legitimate as well as phishing website instances. Each instance contains the URL and the relevant HTML page. The index.sql file is the root file, and it can be used to map the URLs with the relevant HTML pages. The dataset can serve as an input for the machine learning process.

Highlights: - Total number of instances: 80,000 (83,275 instances in the dataset due to the existence of some removed SQL records in preprocessing stage) - Number of legitimate website instances (labelled as 0 in the SQL file): 50,000 - Number of phishing website instances (labelled as 1 in the SQL file): 30,000

Structure: The index.sql file is the root file. It consisted of five fields. 1). rec_id - record number 2). url - URL of the webpage 3). website - Filename of the webpage (i.e. 1635698138155948.html) 4). result - Indicates whether a given URL is phishing or not (0 for legitimate and 1 for phishing). 5). created_date - Webpage downloaded date

Sources: - Legitimate Data [50,000] - These data were collected from two sources. 1). Google search - Simple keyword search on the google search engine was used, and the top 5 URLs of each search were collected. Domain restrictions were used and limited a maximum of 10 collections from a domain to have a diverse collection at the end. 2). Ebbu2017 Phishing Dataset [1] - Nearly 25,874 active URLs were collected from this repository

Phishing Data [30,000] - Three sources were used. 1). PhishTank - From 01 December 2020 to 31 October 2021 2). OpenPhish - From 29 September 2021 to 31 October 2021 3). PhishRepo [2] - From 29 September 2021 to 31 October 2021

Data Collection Process: - Legitimate Data: - The URLs were collected from the above sources and fetched the relevant webpages separately. - The URLs are in different lengths to minimize the URL lengths issue mentioned by Verma et al. [3].

- Phishing Data: - The URLs were collected from the above sources, and at the same time, the relevant web pages were fetched. - An automated script continuously monitored PhishTank and OpenPhish to collect the latest phishing URLs. - The collected URLs were fetched simultaneously to minimize the resource unavailable issue since the phishing pages do not exist for a longer period on the web. - PhishRepo provides all the resources relevant to a phishing webpage; therefore, simply use their download function to download PhishRepo data.

References: [1]. Ebbu2017 Phishing Dataset. Accessed 31 October 2021. Available: https://github.com/ebubekirbbr/pdd/tree/master/input. [2]. PhishRepo. Accessed 31 October 2021. Available: https://moraphishdet.projects.uom.lk/phishrepo/. [3]. Verma, Rakesh M., Victor Zeng, and Houtan Faridi. "Data quality for security challenges: Case studies of phishing, malware and intrusion detection datasets.", 2019.
Phishing Email Dataset
kaggle.com
zip
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naser Abdullah Alam (2024). Phishing Email Dataset [Dataset]. https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset
Explore at:
zip(80864554 bytes)Available download formats
Dataset updated
May 24, 2024
Authors
Naser Abdullah Alam
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
PHISHING EMAIL DATASET

This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.

Initial Datasets:

Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.

CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.

Final Dataset:

The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:

Approximately 82,500 emails

42,891 spam emails

39,595 legitimate emails

This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.

Please cite the following two articles if you are using this dataset:

Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619
m
Web page phishing detection
data.mendeley.com
kaggle.com
Updated Jun 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
Explore at:
Unique identifier
https://doi.org/10.17632/c2gw7fy2j4.3
Dataset updated
Jun 25, 2021
Authors
Abdelhakim Hannousse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.
Most reported cybercrime in the U.S. 2024, by number of individuals affected...
statista.com
Updated Feb 2, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Most reported cybercrime in the U.S. 2024, by number of individuals affected [Dataset]. https://www.statista.com/statistics/184083/commonly-reported-types-of-cyber-crime-us/
Explore at:
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
United States
Description
In 2024, the most common type of cybercrime reported to the United States internet Crime Complaint Center was phishing, with its variation, spoofing, affecting approximately 193,000 individuals. In addition, over 86,000 cases of extortion were reported to the IC3 during that year. Dynamic of phishing attacks Over the past few years, phishing attacks have increased significantly. In 2024, over 193,000 individuals fell victim to such attacks. The highest number of phishing scam victims since 2018 was recorded in 2021, approximately 324 thousand.Phishing attacks can take many shapes. Bulk phishing, smishing, and business e-mail compromise (BEC) are the most common types. With the recent development of generative AI, it has become easier to craft a believable phishing e-mail. This is currently among the top concerns of organizations leaders. Impact of phishing attacks Among the most targeted industries by cybercriminals are healthcare, financial, manufacturing, and education institutions. An observation carried out in the fourth quarter of 2024 found that software-as-a-service (SaaS) and webmail was most likely to encounter phishing attacks. According to the reports, almost a quarter of them stated being targeted by a phishing scam in the measured period.
p
Cybersecurity Statistics in the UK for 2024
privacyengine.io
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PrivacyEngine (2024). Cybersecurity Statistics in the UK for 2024 [Dataset]. https://www.privacyengine.io/cybersecurity-statistics-uk-2024/
Explore at:
Dataset updated
Dec 30, 2024
Dataset authored and provided by
PrivacyEngine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2024
Area covered
United Kingdom
Description
This study analyzes cybersecurity issues across the UK, using data from an independent sample of 850,000 individuals and organizations during the 12 months leading up to July 2024. With cybersecurity breaches affecting 50% of small businesses, 70% of medium businesses, 74% of large businesses, and up to 66% of charities, the findings highlight the increasing threat of ransomware (45%), phishing attacks (30%), and challenges related to GDPR compliance (15%). These insights provide a comprehensive view of how cybersecurity challenges impact businesses and individuals across the region.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2026). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/

Most targeted industry sectors worldwide targeted by phishing Q4 2024

Explore at:

17 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 2, 2026

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

Worldwide

Description

During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.

Clear search

Close search

Google apps

Main menu

Most targeted industry sectors worldwide targeted by phishing Q4 2024

Phishing Email Statistics 2026: The Growing Threat and How to Protect Your...

Number of global phishing attacks Q3 2013- Q4 2024

Phishing: distribution of attacks 2023, by region

🕵️ Phishing Websites Data

Phishing Websites Dataset

Overview

Dataset Information

Features Description

URL-Based Features

Domain-Based Features

Webpage-Based Features

Behavior-Based Features

Traffic & Search Engine Features

Target Variable

Usage

Phishing Website Dataset

StealthPhisher Phishing Attack Dataset

Outcomes of successful phishing attacks in companies worldwide 2021-2023

Phishing URL dataset

Phishing Email & SMS Dataset with NLP Categories

Phishing Categories:

Dataset Structure:

Potential Applications:

Zieni dataset for Phishing detection

Malicious URL Detection Dataset (Enhanced 2026)

Phishing attacks – who is most at risk?

QR code phishing attacks global 2025

Phishing Website Detection Dataset

Phishing Websites Dataset

Phishing Email Dataset

PHISHING EMAIL DATASET

Initial Datasets:

Final Dataset:

Web page phishing detection

Most reported cybercrime in the U.S. 2024, by number of individuals affected...

Cybersecurity Statistics in the UK for 2024

Most targeted industry sectors worldwide targeted by phishing Q4 2024