data-phishing-detection
A dataset to test methods to detect phishing emails The file data.parquet contains the dataset, 400 emails. 200 are synthetic phishing attempts and 200 are synthetic regular emails.
Schema
input - an email, synthesized by an LLM, that is either a phishing attempt or a regular email. output - 'Yes' if the email is a phishing attempt, 'No' otherwise.
Prompt
The prompt.md file contains a prompt that can be used with an LLM as a starting… See the full description on the dataset page: https://huggingface.co/datasets/RevaHQ/data-phishing-detection.
This API is providing the information of press releases issued by the authorized institutions and other similar press releases issued by the HKMA in the past regarding fraudulent bank websites, phishing E-mails and similar scams information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.
Here, the two variants of the Phishing Dataset are presented.
Full variant - dataset_full.csv
Small variant - dataset_small.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.
For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.
In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.
The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.
In 2022, almost all detected phishing kits attempted to gather the names of targets. Three in **** phishing kits also requested e-mail addresses, while ** percent tried accessing home address information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.
dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.
dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.
Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A partial dataset and document-term matrix of phishing emails targeting an institution of higher education and an associated script used for data analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises phishing and legitimate web pages, which have been used for experiments on early phishing detection.
Detailed information on the dataset and data collection is available at
Bram van Dooremaal, Pavlo Burda, Luca Allodi, and Nicola Zannone. 2021.Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection. In ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security. ACM.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Email Phishing Dataset is designed for phishing email detection using machine learning.
2) Data Utilization (1) Email Phishing Dataset has characteristics that: • All emails were refined and subjected to a custom NLP feature extraction pipeline focused on phishing metrics. • This dataset contains no raw text or headers, only features engineered for model training/testing. (2) Email Phishing Dataset can be used to: • Developing an email detection model: It can be used to train and evaluate AI models that classify normal mail and phishing mail using various characteristics such as email body, subject, and sender. • E-mail security policy and threat analysis research: Analyzing real phishing cases and normal email data to derive the characteristics of phishing attacks, and use them to establish effective email security policies and develop threat response strategies.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The phishing protection market is experiencing robust growth, driven by the escalating sophistication and frequency of phishing attacks targeting individuals and organizations across diverse sectors. The increasing reliance on digital platforms for banking, e-commerce, and communication has created a fertile ground for cybercriminals, necessitating robust security measures. The market's expansion is fueled by several key factors, including the rising adoption of cloud-based security solutions, the proliferation of mobile devices, and the increasing awareness of phishing threats among both individuals and businesses. Furthermore, stringent government regulations concerning data privacy and security are compelling organizations to invest heavily in advanced phishing protection technologies. While the BFSI (Banking, Financial Services, and Insurance) sector remains a significant adopter, growth is also observed across other sectors like healthcare, government, and telecommunications, due to their sensitive data assets and heightened regulatory scrutiny. The market is segmented by application (BFSI, Government, Healthcare, Telecommunications and IT, Transportation, Education, Retail) and type of phishing (email-based, non-email-based), each presenting unique opportunities for specialized solutions. Companies like Cyren, BAE Systems, Microsoft, FireEye, Symantec, Proofpoint, GreatHorn, Cisco, Phishlabs, Intel, and Mimecast are key players, constantly innovating to counter evolving phishing tactics. The market's growth trajectory is projected to remain positive over the forecast period (2025-2033), although challenges remain. These include the ever-evolving nature of phishing techniques, the difficulty in detecting sophisticated attacks, and the ongoing skills gap in cybersecurity expertise. Despite these obstacles, the market’s future looks promising, spurred by continuous advancements in artificial intelligence (AI) and machine learning (ML) technologies which enhance threat detection and response capabilities. The increasing adoption of multi-layered security solutions, incorporating phishing protection alongside other security measures, further contributes to the overall market growth. The geographical distribution of the market indicates strong growth in North America and Europe, while the Asia-Pacific region is poised for significant expansion in the coming years, driven by increasing internet penetration and digitalization.
In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.
Domain: The URL itself. Ranking: Page Ranking isIp: Is there an IP address in the weblink valid: This data is fetched from google's whois API that tells us more about the current status of the URL's registration. activeDuration: Also from whois API. Gives the duration of the time since the registration up until now. urlLen: It is simply the length of the URL is@: If the link has a '@' character then it's value = 1 isredirect: If the link has double dashes, there is a chance that it is a redirect. 1-> multiple dashes present together. haveDash: If there are any dashes in the domain name. domainLen: The length of just the domain name. noOfSubdomain: The number of subdomains preset in the URL. Labels: 0 -> Legitimate website , 1 -> Phishing Link/ Spam Link
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A BitTorrent file to download data with the title 'Phishing corpus'
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The phishing simulation market is experiencing robust growth, driven by the escalating sophistication of phishing attacks and the increasing regulatory pressure on organizations to enhance their cybersecurity posture. The market, currently valued at approximately $1.5 billion in 2025 (estimated based on typical market sizes for cybersecurity segments with similar growth rates), is projected to experience a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This expansion is fueled by several key factors. Firstly, the rising frequency and success rate of phishing campaigns targeting both large enterprises and SMEs necessitate proactive security measures like simulation training. Secondly, evolving attack vectors and techniques demand continuous adaptation and improvement in security awareness programs, creating a sustained demand for advanced phishing simulation solutions. Thirdly, stringent data privacy regulations like GDPR and CCPA are imposing significant penalties for data breaches resulting from successful phishing attacks, motivating organizations to invest heavily in preventative measures including simulation-based training. The market segmentation reveals a significant share held by software-based solutions, owing to their scalability, ease of deployment, and cost-effectiveness. However, the service segment is also experiencing strong growth due to the increasing need for expert guidance and managed services in designing and implementing effective phishing simulation programs. Geographically, North America currently dominates the market, followed by Europe, reflecting the high level of cybersecurity awareness and regulatory compliance in these regions. However, the Asia-Pacific region is expected to exhibit the highest growth rate over the forecast period, driven by increasing digital adoption and rising awareness of cybersecurity threats in developing economies. While the market faces certain restraints, such as the need for specialized expertise and the potential for high implementation costs, the overall growth trajectory remains positive, driven by the overwhelming need to combat the ever-evolving threat landscape of phishing attacks.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global phishing protection market size was valued at approximately USD 900 million in 2023 and is projected to reach USD 2.4 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% from 2024 to 2032. The growth of this market is fueled by the escalating volume and sophistication of phishing attacks, coupled with increasing awareness about cybersecurity among organizations across various industries.
One of the significant growth factors driving the phishing protection market is the increasing number of cyberattacks targeting both individuals and organizations. Phishing attacks have become more sophisticated, making it crucial for businesses to invest in advanced protection measures. The rise in spear-phishing, where attackers target specific individuals within an organization, has heightened the need for robust phishing protection solutions. Moreover, the financial and reputational damage caused by successful phishing attacks is pushing organizations to adopt comprehensive security solutions, thereby driving market growth.
Another critical factor contributing to the market's expansion is the growing regulatory landscape around data protection and cybersecurity. Governments and regulatory bodies across the globe are implementing stringent regulations to ensure data security and protect consumer information. Compliance with regulations such as GDPR in Europe, CCPA in California, and other data protection laws worldwide necessitates the deployment of advanced phishing protection solutions. Organizations must adhere to these regulations to avoid hefty fines and legal repercussions, further propelling the adoption of phishing protection services and software.
The increasing adoption of digital transformation strategies by enterprises is also a significant driver of market growth. As businesses migrate their operations to cloud platforms and adopt new technologies, they become more vulnerable to cyber threats, including phishing attacks. The shift towards remote work and the integration of Bring Your Own Device (BYOD) policies have expanded the attack surface for cybercriminals. Consequently, organizations are prioritizing investments in phishing protection solutions to safeguard their digital assets and maintain business continuity in a highly digitized environment.
In addition to phishing attacks, organizations are increasingly facing threats from credential stuffing, a type of cyberattack where attackers use automated tools to try multiple username and password combinations to gain unauthorized access to user accounts. This has led to a growing demand for Credential Stuffing Protection solutions, which are designed to detect and block such attempts. These solutions often employ advanced techniques such as behavioral analytics and machine learning to identify suspicious login activities and prevent unauthorized access. As businesses continue to digitize their operations and store sensitive data online, the need for robust Credential Stuffing Protection measures becomes even more critical. By implementing these solutions, organizations can safeguard their user accounts and maintain trust with their customers.
Regionally, North America is anticipated to dominate the phishing protection market during the forecast period, owing to the high incidence of cyberattacks and the presence of leading cybersecurity companies. Europe is also expected to witness significant growth, driven by stringent data protection regulations and increasing cyber threats. The Asia Pacific region is projected to exhibit the highest CAGR, fueled by rapid digitalization, increasing internet penetration, and growing awareness about cybersecurity threats. Latin America, the Middle East, and Africa are also expected to contribute to the market's growth, albeit at a slower pace compared to other regions.
The phishing protection market is segmented by components into software and services. The software segment is expected to hold a significant share of the market, as organizations increasingly rely on advanced software solutions to detect and prevent phishing attacks. Software solutions typically include email filtering, URL filtering, and anti-phishing tools that help identify and block malicious content. Moreover, the continuous advancements in machine learning and artificial intelligence are enhancing the capabilities of phishing protection software, making them more effective in ide
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have curated 11 datasets. The Nazario and Nigerian Fraud datasets contain only phishing emails.Cite this dataset:A. I. Champa, M. F. Rabbi, and M. F. Zibran, “Why phishing emails escape detection: A closer look at the failure points,” in 12th International Symposium on Digital Forensics and Security (ISDFS), 2024, pp. 1–6 (to appear).or@inproceedings{champa2024why, title={Why Phishing Emails Escape Detection: A Closer Look at the Failure Points}, author={Champa, Arifa I and Rabbi, Md Fazle and Zibran, Minhaz F}, booktitle={12th International Symposium on Digital Forensics and Security (ISDFS)}, pages = {1--6 (to appear)}, year={2024}}
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global anti-phishing tools and services market is experiencing robust growth, driven by the escalating sophistication and frequency of phishing attacks targeting both small and large enterprises. The market, estimated at $5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. The increasing adoption of cloud-based services and remote work models expands the attack surface, making organizations more vulnerable. Furthermore, the evolution of phishing techniques, including deepfakes and sophisticated social engineering tactics, necessitates more advanced and comprehensive security solutions. The market segmentation reveals strong demand across both SMEs, seeking cost-effective solutions, and large enterprises, requiring advanced features and scalable deployments. Leading vendors like Avanan, Barracuda, and Mimecast are actively innovating to meet these evolving needs, incorporating AI and machine learning into their offerings to enhance phishing detection and prevention capabilities. The market's growth is also influenced by stringent data privacy regulations (like GDPR and CCPA) that mandate robust security measures. Companies face significant financial and reputational risks from successful phishing attacks, leading to increased investment in preventative technologies. However, the market faces certain restraints, including the high cost of implementing advanced anti-phishing solutions and the persistent challenge of staying ahead of evolving phishing techniques. Despite these challenges, the market outlook remains positive, driven by a growing awareness of cybersecurity threats and a continuous need for more effective and adaptive anti-phishing solutions. The regional breakdown reveals North America and Europe currently hold the largest market shares due to high technological adoption and stringent regulatory frameworks, but significant growth is expected in the Asia-Pacific region driven by increasing digitalization and rising cybersecurity awareness.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The email-based phishing protection market is experiencing robust growth, projected to reach a substantial size. The market's Compound Annual Growth Rate (CAGR) of 20.3% from 2019-2033 indicates a significant upward trajectory driven by the increasing sophistication of phishing attacks and the rising adoption of cloud-based email services. Businesses of all sizes are facing escalating cybersecurity threats, making robust email security a critical investment. The market is further fueled by the growing awareness of data privacy regulations and the resulting need for compliance, pushing organizations to proactively implement advanced phishing protection measures. Key players like Cyren, BAE Systems, Microsoft, and others are investing heavily in research and development to enhance their offerings, creating a competitive landscape that fosters innovation and drives market expansion. The segment breakdown (while not provided) likely includes solutions categorized by deployment (cloud, on-premise), size of organization (SMB, enterprise), and specific features (anti-spoofing, machine learning-based detection). The forecast period (2025-2033) will see continued expansion, primarily driven by the increasing adoption of advanced threat detection techniques, including AI and machine learning. However, market restraints might include the rising costs associated with implementing and maintaining these solutions, especially for smaller organizations. Despite this, the overall market outlook remains positive due to the ever-increasing threat landscape and the escalating consequences of successful phishing attacks, leading to substantial financial and reputational damage. The geographical distribution is expected to show strong growth in regions with rapidly developing digital economies and increasing internet penetration. Understanding these dynamics is crucial for businesses seeking to capitalize on the opportunities presented within this growing market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing website Detector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/eswarchandt/phishing-website-detector on 12 November 2021.
--- Dataset description provided by original source is as follows ---
The data set is provided both in text file and csv file which provides the following resources that can be used as inputs for model building :
A collection of website URLs for 11000+ websites. Each sample has 30 website parameters and a class label identifying it as a phishing website or not (1 or -1).
The code template containing these code blocks: a. Import modules (Part 1) b. Load data function + input/output field descriptions
The data set also serves as an input for project scoping and tries to specify the functional and non-functional requirements for it.
You are expected to write the code for a binary classification model (phishing website or not) using Python Scikit-Learn that trains on the data and calculates the accuracy score on the test data. You have to use one or more of the classification algorithms to train a model on the phishing website data set.
--- Original source retains full ownership of the source dataset ---
data-phishing-detection
A dataset to test methods to detect phishing emails The file data.parquet contains the dataset, 400 emails. 200 are synthetic phishing attempts and 200 are synthetic regular emails.
Schema
input - an email, synthesized by an LLM, that is either a phishing attempt or a regular email. output - 'Yes' if the email is a phishing attempt, 'No' otherwise.
Prompt
The prompt.md file contains a prompt that can be used with an LLM as a starting… See the full description on the dataset page: https://huggingface.co/datasets/RevaHQ/data-phishing-detection.