Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension.
dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.
dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.
Datasets are constructed on May 2020. Due to huge size of dataset A, only a sample of the dataset is provided, I will try to divide into sample files and upload them one by one, for full copy, please contact directly the author at any time at: hannousse.abdelhakim@univ-guelma.dz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.
The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.
The dataset was created using software available in the associated GitHub repository nesfit/domainradar-dib.
The data is located in the following individual files:
Both files contain a JSON array of records generated using mongoexport (in the MongoDB Extended JSON (v2) format in Relaxed Mode). The following table documents the structure of a record. Please note that:
Field name |
Field type |
Nullable |
Description |
domain_name |
String |
No |
The evaluated domain name |
url |
String |
No |
The source URL for the domain name |
evaluated_on |
Date |
No |
Date of last collection attempt |
source |
String |
No |
An identifier of the source |
sourced_on |
Date |
No |
Date of ingestion of the domain name |
dns |
Object |
Yes |
Data from DNS scan |
rdap |
Object |
Yes |
Data from RDAP or WHOIS |
tls |
Object |
Yes |
Data from TLS handshake |
ip_data |
Array of Objects |
Yes |
Array of data objects capturing the IP addresses related to the domain name |
malware_type |
String |
No |
The malware type/family or “unknown” (only present in malware.json) |
DNS data (dns field) | |||
A |
Array of Strings |
No |
Array of IPv4 addresses |
AAAA |
Array of Strings |
No |
Array of IPv6 addresses |
TXT |
Array of Strings |
No |
Array of raw TXT values |
CNAME |
Object |
No |
The CNAME target and related IPs |
MX |
Array of Objects |
No |
Array of objects with the MX target hostname, priority and related IPs |
NS |
Array of Objects |
No |
Array of objects with the NS target hostname and related IPs |
SOA |
Object |
No |
All the SOA fields, present if found at the target domain name |
zone_SOA |
Object |
No |
The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly |
dnssec |
Object |
No |
Flags describing the DNSSEC validation result for each record type |
ttls |
Object |
No |
The TTL values for each record type |
remarks |
Object |
No |
The zone domain name and DNSSEC flags |
RDAP data (rdap field) | |||
copyright_notice |
String |
No |
RDAP/WHOIS data usage copyright notice |
dnssec |
Bool |
No |
DNSSEC presence flag |
entitites |
Object |
No |
An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities. |
expiration_date |
Date |
Yes |
The current date of expiration |
handle |
String |
No |
RDAP handle |
last_changed_date |
Date |
Yes |
The date when the domain was last changed |
name |
String |
No |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises phishing and legitimate web pages, which have been used for experiments on early phishing detection.
Detailed information on the dataset and data collection is available at
Bram van Dooremaal, Pavlo Burda, Luca Allodi, and Nicola Zannone. 2021.Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection. In ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security. ACM.
In December 2023, around 9.45 million phishing e-mails were detected worldwide, up from 5.59 million in September 2023. This figure has seen a continuous increase since January 2022. It is partially associated with the launch of ChatGPT in November 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a lab-in-the-field experiment to demonstrate how individual behavior in the lab predicts their ability to identify phishing attempts. Using the business and finance staff members from a large public university in the U.S., we find that participants who are intolerant of risk, more curious, and less trusting commit significantly more errors when evaluating interfaces. We also replicate prior results on demographic correlates of phishing vulnerability, including age, gender, and education level. Our results suggest that behavioral characteristics such as risk attitude, curiosity, and trust can be used to predict individual ability to identify phishing interfaces.
Provides ad hoc query and standard report data on the measure for preventing the issuance of SSN cards to non-existent children.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Phishing is an attack where a scammer calls you, texts or emails you, or uses social media to trick you into clicking a malicious link, downloading malware, or sharing sensitive information. Phishing attempts are often generic mass messages, but the message appears to be legitimate and from a trusted source (e.g. from a bank, courier company).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.
This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.
Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.
Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.
--- Original source retains full ownership of the source dataset ---
In 2023, the most common type of cyber crime reported to the United States internet Crime Complaint Center was phishing and spoofing, affecting approximately 298 thousand individuals. In addition, over 55 thousand cases of personal data breaches cases were reported to the IC3 during that year. Dynamic of phishing attacks Over the past few years, phishing attacks have increased significantly. In 2023, almost 300 thousand individuals fell victim to such attacks. The highest number of phishing scam victims since 2018 was recorded in 2021, approximately 324 thousand.Phishing attacks can take many shapes. Bulk phishing, smishing, and business e-mail compromise (BEC) are the most common types. In 2023, 76 percent of the surveyed worldwide organizations reported encountering bulk phishing attacks, while roughly three in four were targeted by smishing scams. Impact of phishing attacks Among the most targeted industries by cybercriminals are healthcare, financial, manufacturing, and education institutions. An observation carried out in the first quarter of 2023 found that social media was most likely to encounter phishing attacks. According to the reports, almost a quarter of them stated being targeted by a phishing scam in the measured period. Very often, phishing e-mails contain a crucial risk for the organization. Almost three in ten worldwide organizations that have experienced phishing attacks suffered from a customer or a client data breach as a consequence. Phishing scams that delivered ransomware infections were also common for the surveyed organizations.
This dataset was created by Kunal Raut
During the third quarter of 2024, 30.5 percent of phishing attacks worldwide targeted Social media. Web-based software services and webmail followed, with around 21.2 percent of registered phishing attacks. Furthermore, Financial institutions accounted for 13 percent of attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing website Detector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/eswarchandt/phishing-website-detector on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The data set is provided both in text file and csv file which provides the following resources that can be used as inputs for model building :
A collection of website URLs for 11000+ websites. Each sample has 30 website parameters and a class label identifying it as a phishing website or not (1 or -1).
The code template containing these code blocks: a. Import modules (Part 1) b. Load data function + input/output field descriptions
The data set also serves as an input for project scoping and tries to specify the functional and non-functional requirements for it.
You are expected to write the code for a binary classification model (phishing website or not) using Python Scikit-Learn that trains on the data and calculates the accuracy score on the test data. You have to use one or more of the classification algorithms to train a model on the phishing website data set.
--- Original source retains full ownership of the source dataset ---
In 2022, almost all detected phishing kits attempted to gather the names of targets. Three in four phishing kits also requested e-mail addresses, while 66 percent tried accessing home address information.
Details of fraud referrals relating to war pensions & compensation
This dataset was created by Samvsam
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Information about Counter Fraud work at Lincolnshire County Council, including use of powers, employees and fraud cases. This information is published as part of the Local Government Transparency Code. Please note that a fraud referral may be made (shown in the dataset under Fraud identified) but not investigated (see Fraud Investigated). For instance, there may be a lack of evidence, a reasonable explanation provided or management action may be taken following preliminary enquiries. Therefore, fraud investigated will usually be less than fraud identified. Also, figures for fraud and figures for irregularities are likely to be identical, as they are in practice not categorised any differerently. This dataset is updated annually each June. For any enquiries about this publication please contact counterfraud@lincolnshire.gov.uk.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Vikash Bhaskar
Released under CC0: Public Domain
This dataset was created by José Henrique Gaspar
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global spear phishing email solution market size was valued at USD 1.2 billion in 2023 and is expected to reach USD 4.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 15.5% during the forecast period. This impressive growth can be attributed to the rising number of phishing attacks targeting enterprises and the increasing need for robust email security solutions. With the proliferation of digital communication in business operations, the risk of falling prey to sophisticated spear phishing attacks has significantly heightened, driving the demand for specialized email security solutions.
One of the primary growth factors in the spear phishing email solution market is the increasing sophistication of phishing attacks. Cybercriminals are employing more advanced tactics and technologies to execute highly targeted attacks that are difficult to detect with traditional security measures. This has led to a growing awareness among organizations about the necessity of implementing advanced spear phishing solutions to safeguard sensitive information and maintain business continuity. Additionally, regulatory requirements and compliance mandates across various industries are compelling organizations to adopt comprehensive email security measures, further propelling market growth.
Another significant driver for the market is the rising adoption of cloud-based email solutions. As businesses continue to migrate their operations to the cloud, the need for cloud-native security solutions that can effectively protect against phishing threats has surged. Cloud-based spear phishing email solutions offer several benefits, including scalability, flexibility, and reduced costs, making them an attractive option for organizations of all sizes. Furthermore, the integration of artificial intelligence (AI) and machine learning (ML) technologies into these solutions enhances their ability to detect and mitigate phishing attempts in real-time, thereby boosting their adoption across various sectors.
The increasing frequency of high-profile data breaches and cyber-attacks has also underscored the importance of robust email security. Organizations are becoming more proactive in their approach to cybersecurity, investing in advanced solutions to prevent potential financial losses and reputational damage. The financial services, healthcare, and government sectors, in particular, have emerged as significant contributors to the market's growth due to the critical nature of the data they handle. These sectors are increasingly deploying spear phishing email solutions to protect their sensitive information from malicious actors.
Regionally, North America is expected to dominate the spear phishing email solution market during the forecast period, owing to the early adoption of advanced cybersecurity solutions and the presence of key market players in the region. Europe and the Asia Pacific are also anticipated to witness substantial growth, driven by increasing digitalization, the rising number of cyber threats, and stringent regulatory requirements. The increasing awareness and adoption of spear phishing solutions in Latin America and the Middle East & Africa are also expected to contribute to the overall market growth.
The spear phishing email solution market can be segmented by component into software and services. The software segment includes solutions designed to detect, prevent, and respond to spear phishing attacks. These software solutions leverage advanced technologies such as artificial intelligence (AI), machine learning (ML), and behavioral analysis to identify and mitigate phishing threats. The growing sophistication of phishing attacks has necessitated the adoption of advanced email security software, making this segment a significant contributor to the market's growth.
The services segment, on the other hand, encompasses various professional and managed services aimed at enhancing an organization's email security posture. Professional services include consulting, training, and implementation services provided by cybersecurity experts to help organizations effectively deploy and manage spear phishing email solutions. Managed services involve outsourcing the management and monitoring of email security to third-party service providers, allowing organizations to focus on their core business operations while ensuring robust protection against phishing threats.
Within the software segment, the integration of AI and ML technologies has significantly
Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.