Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises phishing and legitimate web pages, which have been used for experiments on early phishing detection.
Detailed information on the dataset and data collection is available at
Bram van Dooremaal, Pavlo Burda, Luca Allodi, and Nicola Zannone. 2021.Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection. In ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security. ACM.
In the 3rd quarter of 2024, over 932 thousand unique phishing sites were detected worldwide, representing a slight increase from the preceding quarter. By far, the number of unique phishing sites has seen the most significant jump between the second and the third quarters of 2020, from nearly 147 thousand to approximately 572 thousand. This figure is based on the number of the unique base URLs of the phishing sites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was obtained from UCI machine learning repository in 2019. The dataset consists of eleven thousand and fifty-five (11055) instances with thirty-one (31) attributes and does not contain any missing value whatsoever. The dataset has two decisional conditions (that is, class labels); thus: Phishing is -1 and non-phishing is 1. Of the total 11055 instances, the total number occurrence of instances in the phishing class is 4898, while the non-phishing class contains 6157 total instances.
During the third quarter of 2024, 30.5 percent of phishing attacks worldwide targeted Social media. Web-based software services and webmail followed, with around 21.2 percent of registered phishing attacks. Furthermore, Financial institutions accounted for 13 percent of attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension.
dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.
dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.
Datasets are constructed on May 2020. Due to huge size of dataset A, only a sample of the dataset is provided, I will try to divide into sample files and upload them one by one, for full copy, please contact directly the author at any time at: hannousse.abdelhakim@univ-guelma.dz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.
This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.
Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.
Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.
--- Original source retains full ownership of the source dataset ---
https://choosealicense.com/licenses/gemma/https://choosealicense.com/licenses/gemma/
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/itsprofarul/dataset-phishing.
In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.
This dataset was created by Kunal Raut
This dataset was created by Ashish Goraniya
In 2023, over 298 thousand individuals in the United States reported encountering phishing attacks. This figure had decreased by 0.5 percent compared to the previous year, when the number of phishing attacks nationwide amounted to over 300 thousand. However, in 2020 and 2019, this number was relatively low, around 241 thousand and 114 thousand, respectively.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
safehousetech/phishing dataset hosted on Hugging Face and contributed by the HF Datasets community
In 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was 18.91 percent. In the examined year, Peru was the second region, with an attack rate of nearly 17 percent, while Taiwan followed with 15.59 percent.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the Global market for phishing protection is expected to have a market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
The Asia-Pacific region has the largest market share with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
North America is the fastest growing with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
The solution segment has the largest market share with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
The cloud segment has the largest market share with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
Email-Phishing has the largest market share with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
BFSI has the largest market share with an expected market size of XX million in 2024 with a growing CAGR of XX% during the forecast period.
Market Dynamics
Key Drivers
The rise in the number of phishing attacks globally is favoring the market growth
Strong phishing protection solutions are in more demand as a result of the growing anxiety that phishing assaults are causing among both individuals and enterprises. Phishing is a type of cybercrime that has become more common and sophisticated. In it, attackers pose as reputable organizations in an attempt to trick consumers into disclosing personal information. Because of this and the need for more stringent security and regulatory compliance, the phishing prevention industry is growing quickly. Phishing instances have increased significantly in recent years, according to the FBI's Internet Crime Complaint Centre (IC3). With over 300,000 complaints, phishing was the most frequently reported cybercrime, according to the 2022 Internet Crime Report. Comparing this to other years, there has been a noticeable increase, highlighting the expanding threat scenario. Phishing assaults require sophisticated defenses because they have progressed from straightforward email scams to intricate multi-phase operations that take advantage of human behavior. Businesses are realising that the newest phishing techniques cannot be defeated by using antiquated security solutions. Due to this, phishing prevention systems are becoming more all-inclusive and include features like email filtering, multi-factor authentication, user education, and threat intelligence. The need for these cutting-edge solutions has been further spurred by the growing threat of business email compromise (BEC), in which attackers pose as company officials to start fraudulent activities. Increasing the effectiveness of defenses against phishing is emphasized by government organizations and cybersecurity specialists. Organizations can reduce their exposure to phishing threats by frequently utilizing information and guidance offered by the Cybersecurity and Infrastructure Security Agency (CISA). They advise adopting a zero-trust security paradigm, setting up technical controls to identify and stop questionable emails, and teaching staff members to spot phishing attempts. The need for a multi-layered approach to phishing prevention is highlighted by these best practices, and this is what is fueling the market's expansion. Furthermore, the growing ubiquity of remote labor coupled with digital transformation programs has increased the attack surface available to cybercriminals. Workers who work remotely might not have the same security measures in place as those who work in an office setting, which leaves them more susceptible to phishing scams. Businesses have been forced by this tendency to spend money on cloud-based phishing prevention technologies that can safeguard a staff that is dispersed. Therefore, the escalating frequency and complexity of phishing attacks are propelling the growth of the phishing protection market.
The various regulations and compliances around the globe are favoring market growth
As cyber risks continue to grow, phishing protection has become a key concern for both individuals and enterprises. The laws and regulatory requirements that are propelling the phishing protection market's expansion are intended to strengthen cybersecurity defenses and shield private information from nefarious individuals. The Ge...
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global phishing attack simulation training market is estimated to be worth USD XXX million in 2023 and is projected to grow at a CAGR of XX% from 2023 to 2033. The growth of the market is attributed to the increasing number of phishing attacks, rising awareness about the importance of cybersecurity, and growing adoption of cloud-based solutions. Key drivers of the market include the increasing adoption of phishing simulation training by enterprises to improve employee awareness and reduce the risk of successful phishing attacks. The growing sophistication of phishing campaigns and the need for organizations to comply with regulatory requirements are also contributing to the market growth. The market is segmented by application (SMEs, large enterprises), type (online simulation, offline simulation), and region (North America, South America, Europe, Middle East & Africa, Asia Pacific). North America is expected to dominate the market during the forecast period. The presence of a large number of enterprises and the high awareness about cybersecurity in the region are the key factors driving the growth of the market in North America. Europe is expected to be the second-largest market for phishing attack simulation training. The growing adoption of cloud-based solutions and the presence of a large number of SMEs in the region are contributing to the growth of the market in Europe. Asia Pacific is expected to be the fastest-growing phishing attack simulation training market during the forecast period. The increasing adoption of smartphones and the growing number of internet users in the region are the key factors driving the growth of the market in Asia Pacific.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Phishing is an attack where a scammer calls you, texts or emails you, or uses social media to trick you into clicking a malicious link, downloading malware, or sharing sensitive information. Phishing attempts are often generic mass messages, but the message appears to be legitimate and from a trusted source (e.g. from a bank, courier company).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used in "Informing, simulating experience, or both: A field experiment on phishing risks".
This dataset was created by Samvsam
A comprehensive model for phishing website detection is proposed in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Learning rate—Performance comparison with Phishtank dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises phishing and legitimate web pages, which have been used for experiments on early phishing detection.
Detailed information on the dataset and data collection is available at
Bram van Dooremaal, Pavlo Burda, Luca Allodi, and Nicola Zannone. 2021.Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection. In ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security. ACM.