100+ datasets found

Most targeted industry sectors worldwide targeted by phishing Q4 2024
statista.com
thefarmdosupply.com
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/
Explore at:
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.
Phishing: distribution of attacks 2023, by region
statista.com
tokrwards.com
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Phishing: distribution of attacks 2023, by region [Dataset]. https://www.statista.com/statistics/266362/phishing-attacks-country/
Explore at:
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
In 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was ***** percent. In the examined year, Peru was the second region, with an attack rate of nearly ** percent, while Taiwan followed with ***** percent.
Phishing Website Dataset
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra (2023). Phishing Website Dataset [Dataset]. http://doi.org/10.5281/zenodo.8041387
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8041387
Dataset updated
Jul 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.

For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.

In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.

The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.
S
Phishing Email Statistics 2025: The Growing Threat and How to Protect Your...
sqmagazine.co.uk
Updated Oct 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SQ Magazine (2025). Phishing Email Statistics 2025: The Growing Threat and How to Protect Your Organization [Dataset]. https://sqmagazine.co.uk/phishing-email-statistics/
Explore at:
Dataset updated
Oct 3, 2025
Dataset authored and provided by
SQ Magazine
License
https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
Time period covered
Jan 1, 2024 - Dec 31, 2025
Area covered
Global
Description
It started like any other Tuesday morning. A mid-level finance manager at a US-based logistics firm opened what looked like an urgent request from their CEO. The subject line? “Quarterly Financial Review Needed Immediately.” The logo looked legit. The tone felt familiar. Within two minutes, confidential files were shared, and...
Outcomes of successful phishing attacks in companies worldwide 2021-2023
statista.com
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/statistics/1350723/consequences-phishing-attacks/
Explore at:
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
Phishing Detection Dataset
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Verugopal Iyyer
Description
The dataset 1 contains the age, qualification level, their awareness about phishing and if they became victim to phishing. The dataset 1 contains the result to detection rate before awareness and briefing of phishing after a successful spear phishing.

The dataset 2 contains the age, qualification level, their awareness about phishing and if they became victim to phishing. The dataset 2 contains the result to detection rate after awareness and briefing of phishing after a successful smishing.
f
Table_1_Unveiling suspicious phishing attacks: enhancing detection with an...
frontiersin.figshare.com
docx
Updated Jul 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maruf A. Tamal; Md K. Islam; Touhid Bhuiyan; Abdus Sattar; Nayem Uddin Prince (2024). Table_1_Unveiling suspicious phishing attacks: enhancing detection with an optimal feature vectorization algorithm and supervised machine learning.DOCX [Dataset]. http://doi.org/10.3389/fcomp.2024.1428013.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fcomp.2024.1428013.s001
Dataset updated
Jul 2, 2024
Dataset provided by
Frontiers
Authors
Maruf A. Tamal; Md K. Islam; Touhid Bhuiyan; Abdus Sattar; Nayem Uddin Prince
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe dynamic and sophisticated nature of phishing attacks, coupled with the relatively weak anti-phishing tools, has made phishing detection a pressing challenge. In light of this, new gaps have emerged in phishing detection, including the challenges and pitfalls of existing phishing detection techniques. To bridge these gaps, this study aims to develop a more robust, effective, sophisticated, and reliable solution for phishing detection through the optimal feature vectorization algorithm (OFVA) and supervised machine learning (SML) classifiers.MethodsInitially, the OFVA was utilized to extract the 41 optimal intra-URL features from a novel large dataset comprising 2,74,446 raw URLs (134,500 phishing and 139,946 legitimate URLs). Subsequently, data cleansing, curation, and dimensionality reduction were performed to remove outliers, handle missing values, and exclude less predictive features. To identify the optimal model, the study evaluated and compared 15 SML algorithms arising from different machine learning (ML) families, including Bayesian, nearest-neighbors, decision trees, neural networks, quadratic discriminant analysis, logistic regression, bagging, boosting, random forests, and ensembles. The evaluation was performed based on various metrics such as confusion matrix, accuracy, precision, recall, F-1 score, ROC curve, and precision-recall curve analysis. Furthermore, hyperparameter tuning (using Grid-search) and k-fold cross-validation were performed to optimize the detection accuracy.Results and discussionThe findings indicate that random forests (RF) outperformed the other classifiers, achieving a greater accuracy rate of 97.52%, followed by 97.50% precision, and an AUC value of 97%. Finally, a more robust and lightweight anti-phishing model was introduced, which can serve as an effective tool for security experts, practitioners, and policymakers to combat phishing attacks.
m
Web page phishing detection
data.mendeley.com
Updated Jun 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
Explore at:
Unique identifier
https://doi.org/10.17632/c2gw7fy2j4.3
Dataset updated
Jun 25, 2021
Authors
Abdelhakim Hannousse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.
m
Phishing Websites Dataset
data.mendeley.com
Updated Sep 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grega Vrbančič (2020). Phishing Websites Dataset [Dataset]. http://doi.org/10.17632/72ptz43s9v.1
Explore at:
Unique identifier
https://doi.org/10.17632/72ptz43s9v.1
Dataset updated
Sep 24, 2020
Authors
Grega Vrbančič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

In this repository the two variants of the Phishing Dataset are presented.

Full variant - dataset_full.csv Short description of the full variant dataset: Total number of instances: 88,647 Number of legitimate website instances (labeled as 0): 58,000 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111

Small variant - dataset_small.csv Short description of the small variant dataset: Total number of instances: 58,645 Number of legitimate website instances (labeled as 0): 27,998 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111
Global number of e-mail phishing attacks 2022-2023
statista.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Global number of e-mail phishing attacks 2022-2023 [Dataset]. https://www.statista.com/statistics/1493550/phishing-attacks-global-number/
Explore at:
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2022 - Dec 2023
Area covered
Worldwide
Description
In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.
D
Phishing Protection Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Phishing Protection Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-phishing-protection-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Phishing Protection Market Outlook

The global phishing protection market size was valued at approximately USD 900 million in 2023 and is projected to reach USD 2.4 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% from 2024 to 2032. The growth of this market is fueled by the escalating volume and sophistication of phishing attacks, coupled with increasing awareness about cybersecurity among organizations across various industries.

One of the significant growth factors driving the phishing protection market is the increasing number of cyberattacks targeting both individuals and organizations. Phishing attacks have become more sophisticated, making it crucial for businesses to invest in advanced protection measures. The rise in spear-phishing, where attackers target specific individuals within an organization, has heightened the need for robust phishing protection solutions. Moreover, the financial and reputational damage caused by successful phishing attacks is pushing organizations to adopt comprehensive security solutions, thereby driving market growth.

Another critical factor contributing to the market's expansion is the growing regulatory landscape around data protection and cybersecurity. Governments and regulatory bodies across the globe are implementing stringent regulations to ensure data security and protect consumer information. Compliance with regulations such as GDPR in Europe, CCPA in California, and other data protection laws worldwide necessitates the deployment of advanced phishing protection solutions. Organizations must adhere to these regulations to avoid hefty fines and legal repercussions, further propelling the adoption of phishing protection services and software.

The increasing adoption of digital transformation strategies by enterprises is also a significant driver of market growth. As businesses migrate their operations to cloud platforms and adopt new technologies, they become more vulnerable to cyber threats, including phishing attacks. The shift towards remote work and the integration of Bring Your Own Device (BYOD) policies have expanded the attack surface for cybercriminals. Consequently, organizations are prioritizing investments in phishing protection solutions to safeguard their digital assets and maintain business continuity in a highly digitized environment.

In addition to phishing attacks, organizations are increasingly facing threats from credential stuffing, a type of cyberattack where attackers use automated tools to try multiple username and password combinations to gain unauthorized access to user accounts. This has led to a growing demand for Credential Stuffing Protection solutions, which are designed to detect and block such attempts. These solutions often employ advanced techniques such as behavioral analytics and machine learning to identify suspicious login activities and prevent unauthorized access. As businesses continue to digitize their operations and store sensitive data online, the need for robust Credential Stuffing Protection measures becomes even more critical. By implementing these solutions, organizations can safeguard their user accounts and maintain trust with their customers.

Regionally, North America is anticipated to dominate the phishing protection market during the forecast period, owing to the high incidence of cyberattacks and the presence of leading cybersecurity companies. Europe is also expected to witness significant growth, driven by stringent data protection regulations and increasing cyber threats. The Asia Pacific region is projected to exhibit the highest CAGR, fueled by rapid digitalization, increasing internet penetration, and growing awareness about cybersecurity threats. Latin America, the Middle East, and Africa are also expected to contribute to the market's growth, albeit at a slower pace compared to other regions.

Component Analysis

The phishing protection market is segmented by components into software and services. The software segment is expected to hold a significant share of the market, as organizations increasingly rely on advanced software solutions to detect and prevent phishing attacks. Software solutions typically include email filtering, URL filtering, and anti-phishing tools that help identify and block malicious content. Moreover, the continuous advancements in machine learning and artificial intelligence are enhancing the capabilities of phishing protection software, making them more effective in ide
P
Phishing Protection Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Phishing Protection Report [Dataset]. https://www.datainsightsmarket.com/reports/phishing-protection-466936
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The phishing protection market is experiencing robust growth, driven by the escalating sophistication and frequency of phishing attacks targeting individuals and organizations across diverse sectors. The increasing reliance on digital platforms for banking, e-commerce, and communication has created a fertile ground for cybercriminals, necessitating robust security measures. The market's expansion is fueled by several key factors, including the rising adoption of cloud-based security solutions, the proliferation of mobile devices, and the increasing awareness of phishing threats among both individuals and businesses. Furthermore, stringent government regulations concerning data privacy and security are compelling organizations to invest heavily in advanced phishing protection technologies. While the BFSI (Banking, Financial Services, and Insurance) sector remains a significant adopter, growth is also observed across other sectors like healthcare, government, and telecommunications, due to their sensitive data assets and heightened regulatory scrutiny. The market is segmented by application (BFSI, Government, Healthcare, Telecommunications and IT, Transportation, Education, Retail) and type of phishing (email-based, non-email-based), each presenting unique opportunities for specialized solutions. Companies like Cyren, BAE Systems, Microsoft, FireEye, Symantec, Proofpoint, GreatHorn, Cisco, Phishlabs, Intel, and Mimecast are key players, constantly innovating to counter evolving phishing tactics. The market's growth trajectory is projected to remain positive over the forecast period (2025-2033), although challenges remain. These include the ever-evolving nature of phishing techniques, the difficulty in detecting sophisticated attacks, and the ongoing skills gap in cybersecurity expertise. Despite these obstacles, the market’s future looks promising, spurred by continuous advancements in artificial intelligence (AI) and machine learning (ML) technologies which enhance threat detection and response capabilities. The increasing adoption of multi-layered security solutions, incorporating phishing protection alongside other security measures, further contributes to the overall market growth. The geographical distribution of the market indicates strong growth in North America and Europe, while the Asia-Pacific region is poised for significant expansion in the coming years, driven by increasing internet penetration and digitalization.
D
Phishing Detection AI Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Phishing Detection AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/phishing-detection-ai-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Phishing Detection AI Market Outlook

According to our latest research, the global phishing detection AI market size reached USD 1.82 billion in 2024, underscoring the significant investments and technological advancements in cybersecurity. The market is experiencing robust expansion, with a compound annual growth rate (CAGR) of 23.7% projected from 2025 to 2033. By the end of 2033, the phishing detection AI market size is forecasted to reach USD 14.34 billion, driven by the escalating sophistication of phishing attacks and the urgent need for advanced, AI-driven security solutions. This growth is primarily attributed to the increasing adoption of AI-powered tools across industries, rapid digital transformation, and the proliferation of remote and hybrid work models that have exposed new vulnerabilities in organizational IT infrastructures.

A critical growth factor propelling the phishing detection AI market is the relentless evolution of phishing techniques employed by cybercriminals. Modern phishing attacks have become highly targeted, utilizing machine learning and social engineering tactics that can bypass traditional security filters. Organizations are increasingly aware that legacy security systems are inadequate to counter these sophisticated threats. As a result, there is a surging demand for AI-driven phishing detection solutions that can analyze vast datasets, recognize behavioral anomalies, and provide real-time threat intelligence. The ability of AI to adapt and learn from new attack vectors ensures organizations can stay ahead of cyber adversaries, making AI-based phishing detection an essential component of contemporary cybersecurity strategies.

Another significant driver is the proliferation of digital communication channels, particularly email, web, and mobile platforms, which have become primary vectors for phishing attacks. The rapid adoption of cloud services and mobile devices has expanded the attack surface for organizations, necessitating comprehensive security frameworks that leverage AI for holistic threat detection. Enterprises, especially in sectors such as BFSI, healthcare, and retail, are investing heavily in AI-powered phishing detection to safeguard sensitive customer data and maintain regulatory compliance. Additionally, the increasing frequency of high-profile data breaches and the rising costs associated with cyberattacks are compelling organizations to prioritize proactive threat detection and response capabilities, further fueling market growth.

The integration of phishing detection AI into existing cybersecurity ecosystems is also being accelerated by regulatory mandates and industry standards. Governments and regulatory bodies worldwide are enforcing stricter data protection and cybersecurity regulations, compelling organizations to adopt advanced security measures. In parallel, the growing awareness among small and medium enterprises (SMEs) about the risks posed by phishing, combined with the availability of scalable, cloud-based AI solutions, is democratizing access to cutting-edge security technologies. This convergence of regulatory pressure, technological innovation, and heightened risk awareness is creating a fertile environment for the sustained expansion of the phishing detection AI market.

From a regional perspective, North America currently leads the phishing detection AI market, accounting for the largest revenue share in 2024, due to its mature cybersecurity infrastructure and high incidence of targeted phishing campaigns. Europe and Asia Pacific are also witnessing substantial growth, driven by increasing digitalization, stringent regulatory frameworks, and rising cyber threats. In particular, Asia Pacific is expected to register the fastest CAGR during the forecast period, propelled by rapid technological adoption in countries like China, India, and Japan. Meanwhile, Latin America and the Middle East & Africa are emerging as promising markets, as organizations in these regions ramp up investments in AI-driven cybersecurity to address evolving threat landscapes and bridge security gaps.

Component Analysis

The phishing detection AI market by component is segmented into software, hardware, and services, each playing a distinct yet interrelated role in the deployment of comprehensive security solutions. The software segment remains the dominant revenue contributor, primarily due to the rapid advancements in AI algorithms and the growing demand for integrated, automated threat detection platform
e
Data set of "Falling and failing (to learn)"
datarepository.eur.nl
pdf
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aurelien Baillon; Francesco Capozza; David Gonzalez-Jimenez (2025). Data set of "Falling and failing (to learn)" [Dataset]. https://datarepository.eur.nl/articles/dataset/Data_set_of_Falling_and_failing_to_learn_/28123376
Explore at:
pdfAvailable download formats
Dataset updated
Jul 16, 2025
Dataset provided by
Erasmus University Rotterdam (EUR)
Authors
Aurelien Baillon; Francesco Capozza; David Gonzalez-Jimenez
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
Data set of "Falling and failing (to learn): Evidence from a Nation-Wide Cybersecurity Field Experiment with SMEs"Accepted for publication in the Journal of Economic Behavior and Organization Abstract:Prior experiences are crucial in shaping risk prevention behavior. Previous studies have shown that experiencing a simulated phishing attack (a ``phishing drill") reduces the likelihood of clicking on unsafe links and disclosing one's password. In a large field experiment involving 670 small and medium-sized enterprises (SMEs) and their 33,000 employees, we examined the impact of experience on individuals' ability to detect cyber-security threats, and whether this effect persisted over several months. We collected data at both the company and individual levels, including risk preference, time preference, and trust. Our findings indicate only a non-systematic, short-term effect of previous phishing emails on clicking behavior. A cluster of individuals with greater patience, trust, and risk seeking was more likely to click on phishing links in the first place but then also more likely to benefit from phishing drills.
m
LegitPhish Dataset
data.mendeley.com
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachana Potpelwar (2025). LegitPhish Dataset [Dataset]. http://doi.org/10.17632/hx4m73v2sf.1
Explore at:
Unique identifier
https://doi.org/10.17632/hx4m73v2sf.1
Dataset updated
Apr 7, 2025
Authors
Rachana Potpelwar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains 101,219 URLs and 18 features (including the label). Here's a description of each attribute: Phishing (0): 63,678 URLs

Legitimate (1): 37,540 URLs

These URLs have been sourced from the URLHaus database, scraped from many sites and other well-known repositories malicious websites actively used in phishing attacks. Each entry in this subset has been manually verified and is labeled as a phishing URL, making this dataset highly reliable for identifying harmful web content.

The legitimate URLs have been collected from reputable sources such as Wikipedia and Stack Overflow. These websites are known for hosting user-generated content and community discussions, ensuring that the URLs represent safe, legitimate web addresses. The URLs were randomly scraped to ensure diversity in the types of legitimate sites included. Dataset Features:

URL: The full web address of each entry, providing the primary feature for analysis. Label: A binary label indicating whether the URL is legitimate (1) or phishing (0). Applications:

This dataset is suitable for training and evaluating machine learning models aimed at distinguishing between phishing and legitimate websites. It can be used in a variety of cybersecurity research projects, including URL-based phishing detection, web content analysis, and the development of real-time protection systems.

Usage:

Researchers can leverage this balanced dataset to develop and test algorithms for identifying phishing websites with high accuracy, using features such as URL structure, and class label attributes. The inclusion of both phishing and legitimate URLs provides a comprehensive basis for creating robust models capable of detecting phishing attempts in diverse online environments.

Feature Name Description URL The full URL string. url_length - Total number of characters in the URL. has_ip_address - Binary flag (1/0): whether the URL contains an IP address. dot_count - Number of . characters in the URL. https_flag - Binary flag (1/0): whether the URL uses HTTPS. url_entropy - Shannon entropy of the URL string – higher values indicate more randomness. token_count - Number of tokens/words in the URL. subdomain_count - Number of subdomains in the URL. query_param_count - Number of query parameters (after ?). tld_length - Length of the Top-Level Domain (e.g., "com" = 3). path_length - Length of the path part after the domain. has_hyphen_in_domain Binary flag (1/0): whether the domain contains a hyphen (-). number_of_digits - Total number of numeric characters in the URL. tld_popularity Binary flag (1/0): whether the TLD is popular. suspicious_file_extension Binary flag (1/0): indicates if the URL ends with suspicious extensions (e.g., .exe, .zip). domain_name_length - Length of the domain name. percentage_numeric_chars - Percentage of numeric characters in the URL. ClassLabel Target label: 1 = Legitimate, 0 = Phishing.
S
Spear Phishing Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Spear Phishing Report [Dataset]. https://www.datainsightsmarket.com/reports/spear-phishing-1951598
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The spear phishing market is experiencing robust growth, driven by the increasing sophistication of cyberattacks and the expanding digital landscape. While precise market sizing data is unavailable, considering the substantial investments in cybersecurity and the consistent rise in reported phishing incidents, a reasonable estimate for the 2025 market size would be in the range of $5-7 billion. This figure reflects the rising costs associated with data breaches, regulatory fines, and the increasing demand for advanced threat detection and response solutions. A Compound Annual Growth Rate (CAGR) of 12-15% over the forecast period (2025-2033) is plausible, considering ongoing technological advancements in spear phishing techniques and the corresponding need for robust countermeasures. Key drivers include the growth of remote work, increasing reliance on cloud services, and the evolving tactics employed by cybercriminals to target specific individuals and organizations. Trends point towards a greater focus on artificial intelligence (AI) and machine learning (ML) in threat detection, as well as a shift towards proactive security measures and employee training programs to mitigate the impact of spear phishing attacks. However, restraints include the ever-evolving nature of spear phishing techniques, the persistent skills gap in cybersecurity professionals, and the potential for false positives in automated detection systems. Segmentation within the market is likely to exist based on solution type (e.g., email security, security awareness training), deployment model (cloud, on-premises), and target industry (financial services, healthcare, government). Companies like BAE Systems, Check Point Software Technologies, Cisco Systems, and Proofpoint are key players actively innovating and competing within this dynamic market. The significant market expansion is further fueled by the high financial stakes involved in successful spear phishing campaigns. The impact of successful attacks, including data breaches, financial losses, and reputational damage, encourages organizations to invest heavily in comprehensive security solutions. The proliferation of sophisticated spear phishing techniques, such as personalized phishing emails and the use of social engineering, necessitates advanced detection and prevention technologies. The market's competitive landscape is characterized by both established cybersecurity vendors and emerging players who are constantly developing new solutions to combat the threat of spear phishing. The competitive dynamics will likely lead to further innovation and drive market growth in the coming years, enhancing the overall sophistication of spear phishing detection and prevention solutions.
A
Antiphishing Tools Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Antiphishing Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/antiphishing-tools-524350
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Aug 3, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The anti-phishing tools market is experiencing robust growth, driven by the escalating sophistication and frequency of phishing attacks targeting individuals and organizations. The market, estimated at $5 billion in 2025, is projected to expand at a compound annual growth rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. This growth is fueled by several key factors. Increased adoption of cloud-based services and remote work models has broadened the attack surface, making organizations more vulnerable. Simultaneously, phishing techniques are becoming increasingly sophisticated, employing AI and social engineering to bypass traditional security measures. Regulations like GDPR and CCPA are also driving demand for robust anti-phishing solutions to ensure data privacy and compliance. The market is segmented by deployment type (cloud, on-premises), organization size (SMB, enterprise), and type of solution (email security, web security, user training). Major players like Barracuda, Mimecast, and Check Point (Avanan) dominate the market, but a competitive landscape exists with numerous smaller vendors offering specialized solutions. Challenges include the constant evolution of phishing tactics, requiring continuous updates and improvements to anti-phishing technologies, along with the need for user education to prevent human error. The market's future trajectory will depend significantly on the development and implementation of advanced AI-powered solutions that can proactively identify and neutralize sophisticated phishing attempts. The increasing adoption of multi-layered security approaches, combining email security, web security, and user training, will be a key trend. While the cost of these solutions can be a barrier for some smaller organizations, the potential financial and reputational damage from successful phishing attacks far outweighs the investment in robust protection. Therefore, the market is poised for sustained growth, driven by the ongoing need to safeguard against ever-evolving phishing threats. Further geographic expansion, particularly in emerging markets with increasing internet penetration, will also contribute to market growth.
Phishing attacks – who is most at risk?
s3.amazonaws.com
gov.uk
Updated Sep 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2022). Phishing attacks – who is most at risk? [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/183/1838480.html
Explore at:
Dataset updated
Sep 26, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Office for National Statistics
Description
Official statistics are produced impartially and free from political influence.
f
phishing_url_test
figshare.com
txt
Updated Jan 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepchecks Data (2022). phishing_url_test [Dataset]. http://doi.org/10.6084/m9.figshare.17878589.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17878589.v2
Dataset updated
Jan 13, 2022
Dataset provided by
figshare
Authors
Deepchecks Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The phishing url dataset contains synthetic data of urls - some regular and some used for phishing. This is the test dataset.The dataset is based on the project (https://github.com/Rohith-2/url_classification_dl) byRohith Ramakrishnan (https://www.linkedin.com/in/rohith-ramakrishnan-54094a1a0/) and others, accompanied bya blog post (https://medium.com/nerd-for-tech/url-feature-engineering-and-classification-66c0512fb34d>).The dataset is released under Creative Commons Zero v1.0 Universal (CC0 1.0).
A
‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-phishing-dataset-for-machine-learning-9439/53570f2e/?iid=130-479&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 29 August 2021.

--- Dataset description provided by original source is as follows ---

Context

Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.

Content

This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.

Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.

Acknowledgements

Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/

Most targeted industry sectors worldwide targeted by phishing Q4 2024

Explore at:

17 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 1, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

Worldwide

Description

During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.

Clear search

Close search

Google apps

Main menu

Most targeted industry sectors worldwide targeted by phishing Q4 2024

Phishing: distribution of attacks 2023, by region

Phishing Website Dataset

Phishing Email Statistics 2025: The Growing Threat and How to Protect Your...

Outcomes of successful phishing attacks in companies worldwide 2021-2023

Phishing Detection Dataset

Table_1_Unveiling suspicious phishing attacks: enhancing detection with an...

Web page phishing detection

Phishing Websites Dataset

Global number of e-mail phishing attacks 2022-2023

Phishing Protection Market Report | Global Forecast From 2025 To 2033

Phishing Protection Market Outlook

Component Analysis

Phishing Protection Report

Phishing Detection AI Market Research Report 2033

Phishing Detection AI Market Outlook

Component Analysis

Data set of "Falling and failing (to learn)"

LegitPhish Dataset

Spear Phishing Report

Antiphishing Tools Report

Phishing attacks – who is most at risk?

phishing_url_test

‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2

Context

Content

Acknowledgements

Most targeted industry sectors worldwide targeted by phishing Q4 2024