100+ datasets found
  1. Most targeted industry sectors worldwide targeted by phishing Q4 2024

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.

  2. S

    Phishing Email Statistics 2026: The Growing Threat and How to Protect Your...

    • sqmagazine.co.uk
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). Phishing Email Statistics 2026: The Growing Threat and How to Protect Your Organization [Dataset]. https://sqmagazine.co.uk/phishing-email-statistics/
    Explore at:
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2024 - Dec 31, 2026
    Area covered
    Earth, Worldwide
    Description

    Explore key phishing email stats, including attack frequency, success rates, target industries, user vulnerability, and cybersecurity impact!

  3. Number of global phishing attacks Q3 2013- Q4 2024

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Number of global phishing attacks Q3 2013- Q4 2024 [Dataset]. https://www.statista.com/statistics/266155/number-of-phishing-attacks-worldwide/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the 4th quarter of 2024, over 989,000 unique phishing attacks were detected worldwide, representing a slight increase from the preceding quarter. By far, the number of unique phishing sites has seen the most significant jump between the second and the third quarters of 2020, from nearly 147,000 to approximately 572,000. This figure is based on the number of the unique base URLs of the phishing sites.

  4. s

    Phishing: distribution of attacks 2023, by region

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Phishing: distribution of attacks 2023, by region [Dataset]. https://www.statista.com/statistics/266362/phishing-attacks-country/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statista
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, users in Vietnam were most frequently targeted by phishing attacks. The phishing attack rate among internet users in the country was ***** percent. In the examined year, Peru was the second region, with an attack rate of nearly ** percent, while Taiwan followed with ***** percent.

  5. πŸ•΅οΈ Phishing Websites Data

    • kaggle.com
    zip
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sairaj Adhav (2025). πŸ•΅οΈ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
    Explore at:
    zip(89566 bytes)Available download formats
    Dataset updated
    Feb 24, 2025
    Authors
    Sairaj Adhav
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Phishing Websites Dataset

    Overview

    This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

    Dataset Information

    • Total Columns: 31 (30 Features + 1 Target)
    • Target Variable: Result (Indicates whether a website is phishing or legitimate)

    Features Description

    URL-Based Features

    • Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.
    • double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.
    • having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.
    • Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).
    • URL_Length – Measures the length of the URL; phishing URLs tend to be longer.
    • having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

    Domain-Based Features

    • having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.
    • SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).
    • Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.
    • age_of_domain – The age of the domain in days; older domains are usually more trustworthy.
    • DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

    Webpage-Based Features

    • Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).
    • port – Identifies if the site is using suspicious or non-standard ports.
    • HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.
    • Request_URL – Measures the percentage of external resources loaded from different domains.
    • URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.
    • Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.
    • SFH (Server Form Handler) – Determines if form actions are handled suspiciously.
    • Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.
    • Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.
    • Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

    Behavior-Based Features

    • on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).
    • RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).
    • popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.
    • Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

    Traffic & Search Engine Features

    • web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.
    • Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.
    • Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).
    • Links_pointing_to_page – Counts the number of backlinks pointing to the website.
    • Statistical_report – Uses external sources to verify if the website has been reported for phishing.

    Target Variable

    • Result – The classification label (1: Legitimate, -1: Phishing)

    Usage

    This dataset is valuable for:
    βœ… Machine Learning Models – Developing classifiers for phishing detection.
    βœ… Cybersecurity Research – Understanding patterns in phishing attacks.
    βœ… Browser Security Extensions – Enhancing anti-phishing tools.

  6. Phishing Website Dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv, zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra (2023). Phishing Website Dataset [Dataset]. http://doi.org/10.5281/zenodo.8041387
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    I Kadek Agus Ariesta Putra; I Kadek Agus Ariesta Putra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of legitimate and phishing websites, along with information on the target brands (brands.csv) being impersonated in the phishing attacks. The dataset includes a total of 10,395 websites, 5,244 of which are legitimate and 5,151 of which are phishing websites. These websites impersonate a total of 86 different target brands.

    For phishing datasets, the files can be downloaded in a zip file with a "phishing" prefix, while for legitimate websites, the files can be downloaded in a zip file with a "not-phishing" prefix.

    In addition, the dataset includes features such as screenshots, text, CSS, and HTML structure for each website, as well as domain information (WHOIS data), IP information, and SSL information. Each website is labeled as either legitimate or phishing and includes additional metadata such as the date it was discovered, the target brand being impersonated, and any other relevant information.

    The dataset has been curated for research purposes and can be used to analyze the effectiveness of phishing attacks, develop and evaluate anti-phishing solutions, and identify trends and patterns in phishing attacks. It is hoped that this dataset will contribute to the advancement of research in the field of cybersecurity and help improve our understanding of phishing attacks.

  7. m

    StealthPhisher Phishing Attack Dataset

    • data.mendeley.com
    • kaggle.com
    Updated Nov 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanmay Jha (2025). StealthPhisher Phishing Attack Dataset [Dataset]. http://doi.org/10.17632/m2479kmybx.2
    Explore at:
    Dataset updated
    Nov 7, 2025
    Authors
    Tanmay Jha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The StealthPhisher Phishing Attack Dataset, generated at the Cybersecurity Lab, GLA University, Mathura, is a large, diverse, and recent Phishing Attack Dataset developed to address the evolving nature of phishing attacks. It comprises over 336,749 records, including 160,943 legitimate URLs and 175,806 phishing URLs, collected from reliable sources such as PhishTank. Reflecting the most recent phishing tactics, this dataset serves as a valuable resource for training and evaluating AI-based phishing detection systems.

    Key features include URL-based attributes (e.g., length, TLD type, IP presence), statistical metrics (e.g., Shannon Entropy, Kolmogorov Complexity, Fractal Dimension), and HTML/interaction-based features (e.g., popups, redirects, forms). These multidimensional attributes provide comprehensive insights into phishing behavior, enabling accurate and robust threat detection. Designed to capture real-world scenarios, the dataset equips AI models to recognize both traditional and emerging phishing strategies effectively.

    This dataset was generated as part of the research work presented in the article β€œStealthPhisher: A Defensive Framework against Phishing Attack using Hybrid Deep Learning and GenAI,” published in Expert Systems with Applications (https://doi.org/10.1016/j.eswa.2025.130205). Researchers using this dataset in their research work are kindly requested to cite this article.

  8. Outcomes of successful phishing attacks in companies worldwide 2021-2023

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/statistics/1350723/consequences-phishing-attacks/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.

  9. m

    Phishing URL dataset

    • data.mendeley.com
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JISHNU K S KAITHOLIKKAL (2024). Phishing URL dataset [Dataset]. http://doi.org/10.17632/vfszbj9b36.1
    Explore at:
    Dataset updated
    Apr 2, 2024
    Authors
    JISHNU K S KAITHOLIKKAL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Phishing URL dataset exclusively contains 54,807 URLs identified as phishing, providing a focused resource for studying and combating malicious online activities. Meanwhile, the URL dataset comprises 450,176 URLs sourced from various platforms, including PhisTank, the Majestic Million, and other pertinent sources. Each URL in the dataset is meticulously categorized as either "phishing" or "legitimate." Among these URLs, 104,438 have been flagged as phishing URLs, indicating malicious intent, while the remaining 345,738 URLs are classified as legitimate, denoting non-malicious or benign activity. This extensive dataset, drawn from multiple reputable sources, serves as a crucial asset for cybersecurity researchers and practitioners, facilitating the development and validation of advanced techniques for effectively detecting and mitigating phishing attacks.

  10. Phishing Email & SMS Dataset with NLP Categories

    • kaggle.com
    zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad Tijjani (2025). Phishing Email & SMS Dataset with NLP Categories [Dataset]. https://www.kaggle.com/datasets/ahmadtijjani/phishing-urgency-authority-persuasion
    Explore at:
    zip(5103 bytes)Available download formats
    Dataset updated
    Mar 25, 2025
    Authors
    Ahmad Tijjani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Phishing Message Dataset (1000 Samples)

    This dataset comprises 1,000 phishing messages, categorized based on NLP-based deception techniques commonly used in social engineering attacks.

    Phishing Categories:

    Urgency – Messages that create a sense of immediate action.
    Authority – Messages impersonating trusted figures or organizations.
    Persuasion – Messages using manipulative language to convince the recipient.

    Dataset Structure:

    Each record contains the following fields:
    - text – The phishing message (email or SMS).
    - category – The type of phishing attack (urgency, authority, persuasion).
    - label – A classification label ("phishing") for machine learning tasks.

    Potential Applications:

    Natural Language Processing (NLP) – Analyze linguistic patterns in phishing messages.
    Cybersecurity Research – Identify deceptive techniques used in phishing attacks.
    Phishing Detection Models – Train AI models to classify and detect phishing messages.
    AI-driven Threat Analysis – Improve automated cybersecurity threat detection.

    This dataset serves as a valuable resource for developing AI-powered solutions in cybersecurity and NLP-based phishing detection.

  11. m

    Zieni dataset for Phishing detection

    • data.mendeley.com
    • kaggle.com
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasha Zieni (2024). Zieni dataset for Phishing detection [Dataset]. http://doi.org/10.17632/8mcz8jsgnb.1
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Rasha Zieni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used for training machine learning models to detect phishing attacks and for studying the explainability of these models. It was published in 2024. The dataset refers to phishing and legitimate websites. Phishing samples have been collected from two sources, namely, PhishTank and Tranco, whereas legitimate samples were collected from Alexa. The dataset is balanced and contains 5,000 phishing and 5,000 legitimate samples, each described by 74 features extracted from the entire URL as well as from the Fully Qualified Domain Name, pathname, filename, and parameters. Of these features, 70 are numerical and four binary. The target variable is also binary.

  12. Malicious URL Detection Dataset (Enhanced 2026)

    • kaggle.com
    zip
    Updated Feb 17, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moutasm tamimi (2026). Malicious URL Detection Dataset (Enhanced 2026) [Dataset]. https://www.kaggle.com/datasets/moutasmtamimi/malicious-url-detection-dataset-enhanced-2026
    Explore at:
    zip(94275271 bytes)Available download formats
    Dataset updated
    Feb 17, 2026
    Authors
    moutasm tamimi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Overview This dataset is a large, feature-rich collection of URLs designed for research and development in malicious URL detection. It contains a total of 651,000 samples, each enriched with detailed lexical, structural, statistical, and phishing-related attributes. The goal of the dataset is to support the development of machine learning models capable of identifying harmful URLs before they lead to security incidents such as phishing attacks, data theft, and malware infections. By combining raw URL properties with engineered features, the dataset offers a comprehensive foundation for both traditional and advanced cybersecurity classification models.

    The dataset consists of 65 features, including URL length, character frequencies, entropy measures, domain and subdomain statistics, and multiple phishing-specific indicators. These features capture a wide range of behavioral and structural patterns commonly found in malicious URLs. The final column, label, assigns each entry to one of four categories: benign, defacement, phishing, or malware. This multi-class structure allows the dataset to be used not only for malicious vs. benign classification but also for more detailed threat type identification.

    The class distribution contains 428,103 benign URLs, 96,457 defacement URLs, 93,920 phishing URLs, and 32,520 malware URLs. While the dataset is naturally imbalanced, it remains representative of real-world cyber environments where benign traffic far exceeds malicious activity. This realistic distribution makes the dataset valuable for evaluating model robustness and handling class imbalance through techniques such as sampling or weighted training. Overall, the dataset provides a solid and versatile benchmark for cybersecurity machine learning tasks.

    65 features The final dataset contains 65 features, including both raw URL characteristics and a wide range of engineered attributes. These features cover lexical patterns, special-character counts, entropy measures, subdomain and path statistics, phishing-specific indicators, and various statistical ratios. Together, they provide a comprehensive representation of each URL, making the dataset suitable for building strong and reliable machine learning models for malicious URL detection.

  13. Phishing attacks – who is most at risk?

    • gov.uk
    • s3.amazonaws.com
    Updated Sep 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2022). Phishing attacks – who is most at risk? [Dataset]. https://www.gov.uk/government/statistics/phishing-attacks-who-is-most-at-risk
    Explore at:
    Dataset updated
    Sep 26, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Office for National Statistics
    Description

    Official statistics are produced impartially and free from political influence.

  14. QR code phishing attacks global 2025

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). QR code phishing attacks global 2025 [Dataset]. https://www.statista.com/statistics/1622377/global-qr-phishing-attacks/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2024 - Feb 2025
    Area covered
    Worldwide
    Description

    Between February 2024 and February 2025, nearly 21 percent of employees at global organizations stated they experienced a QR code phishing attack. Additionally, over 21 percent of customers of managed service providers (MSPs) stated encountering such attacks.

  15. Phishing Website Detection Dataset

    • kaggle.com
    zip
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Hasibur Rahman (2024). Phishing Website Detection Dataset [Dataset]. https://www.kaggle.com/datasets/hasibur013/url-data-for-phishing-website-detection
    Explore at:
    zip(83149 bytes)Available download formats
    Dataset updated
    Aug 25, 2024
    Authors
    Md. Hasibur Rahman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The "Phishing Data" dataset is a comprehensive collection of information specifically curated for analyzing and understanding phishing attacks. Phishing attacks involve malicious attempts to deceive individuals or organizations into disclosing sensitive information such as passwords or credit card details. This dataset comprises 18 distinct features that offer valuable insights into the characteristics of phishing attempts. These features include the URL of the website being analyzed, the length of the URL, the use of URL shortening services, the presence of the "@" symbol, the presence of redirection using "//", the presence of prefixes or suffixes in the URL, the number of subdomains, the usage of secure connection protocols (HTTPS), the length of time since domain registration, the presence of a favicon, the presence of HTTP or HTTPS tokens in the domain name, the URL of requested external resources, the presence of anchors in the URL, the number of hyperlinks in HTML tags, the server form handler used, the submission of data to email addresses, abnormal URL patterns, and estimated website traffic or popularity. Together, these features enable the analysis and detection of phishing attempts in the "Phishing Data" dataset, aiding in the development of models and algorithms to combat phishing attacks.

  16. m

    Phishing Websites Dataset

    • data.mendeley.com
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhash Ariyadasa (2021). Phishing Websites Dataset [Dataset]. http://doi.org/10.17632/n96ncsr5g4.1
    Explore at:
    Dataset updated
    Nov 16, 2021
    Authors
    Subhash Ariyadasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of a collection of legitimate as well as phishing website instances. Each instance contains the URL and the relevant HTML page. The index.sql file is the root file, and it can be used to map the URLs with the relevant HTML pages. The dataset can serve as an input for the machine learning process.

    Highlights: - Total number of instances: 80,000 (83,275 instances in the dataset due to the existence of some removed SQL records in preprocessing stage) - Number of legitimate website instances (labelled as 0 in the SQL file): 50,000 - Number of phishing website instances (labelled as 1 in the SQL file): 30,000

    Structure: The index.sql file is the root file. It consisted of five fields. 1). rec_id - record number 2). url - URL of the webpage 3). website - Filename of the webpage (i.e. 1635698138155948.html) 4). result - Indicates whether a given URL is phishing or not (0 for legitimate and 1 for phishing). 5). created_date - Webpage downloaded date

    Sources: - Legitimate Data [50,000] - These data were collected from two sources. 1). Google search - Simple keyword search on the google search engine was used, and the top 5 URLs of each search were collected. Domain restrictions were used and limited a maximum of 10 collections from a domain to have a diverse collection at the end. 2). Ebbu2017 Phishing Dataset [1] - Nearly 25,874 active URLs were collected from this repository

    • Phishing Data [30,000] - Three sources were used. 1). PhishTank - From 01 December 2020 to 31 October 2021 2). OpenPhish - From 29 September 2021 to 31 October 2021 3). PhishRepo [2] - From 29 September 2021 to 31 October 2021

    Data Collection Process: - Legitimate Data: - The URLs were collected from the above sources and fetched the relevant webpages separately. - The URLs are in different lengths to minimize the URL lengths issue mentioned by Verma et al. [3].

     - Phishing Data: 
        - The URLs were collected from the above sources, and at the same time, the relevant web pages were fetched. 
        - An automated script continuously monitored PhishTank and OpenPhish to collect the latest phishing URLs. 
        - The collected URLs were fetched simultaneously to minimize the resource unavailable issue since the phishing pages do not exist for a longer period on the web. 
        - PhishRepo provides all the resources relevant to a phishing webpage; therefore, simply use their download function to download PhishRepo data.
    

    References: [1]. Ebbu2017 Phishing Dataset. Accessed 31 October 2021. Available: https://github.com/ebubekirbbr/pdd/tree/master/input. [2]. PhishRepo. Accessed 31 October 2021. Available: https://moraphishdet.projects.uom.lk/phishrepo/. [3]. Verma, Rakesh M., Victor Zeng, and Houtan Faridi. "Data quality for security challenges: Case studies of phishing, malware and intrusion detection datasets.", 2019.

  17. Phishing Email Dataset

    • kaggle.com
    zip
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naser Abdullah Alam (2024). Phishing Email Dataset [Dataset]. https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset
    Explore at:
    zip(80864554 bytes)Available download formats
    Dataset updated
    May 24, 2024
    Authors
    Naser Abdullah Alam
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    PHISHING EMAIL DATASET

    This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.

    Initial Datasets:

    • Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.

    • CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.

    Final Dataset:

    The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:

    • Approximately 82,500 emails
    • 42,891 spam emails
    • 39,595 legitimate emails

    This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.

    Please cite the following two articles if you are using this dataset:

    • Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619
  18. m

    Web page phishing detection

    • data.mendeley.com
    • kaggle.com
    Updated Jun 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
    Explore at:
    Dataset updated
    Jun 25, 2021
    Authors
    Abdelhakim Hannousse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

    dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

    dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.

  19. Most reported cybercrime in the U.S. 2024, by number of individuals affected...

    • statista.com
    Updated Feb 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Most reported cybercrime in the U.S. 2024, by number of individuals affected [Dataset]. https://www.statista.com/statistics/184083/commonly-reported-types-of-cyber-crime-us/
    Explore at:
    Dataset updated
    Feb 2, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    United States
    Description

    In 2024, the most common type of cybercrime reported to the United States internet Crime Complaint Center was phishing, with its variation, spoofing, affecting approximately 193,000 individuals. In addition, over 86,000 cases of extortion were reported to the IC3 during that year. Dynamic of phishing attacks Over the past few years, phishing attacks have increased significantly. In 2024, over 193,000 individuals fell victim to such attacks. The highest number of phishing scam victims since 2018 was recorded in 2021, approximately 324 thousand.Phishing attacks can take many shapes. Bulk phishing, smishing, and business e-mail compromise (BEC) are the most common types. With the recent development of generative AI, it has become easier to craft a believable phishing e-mail. This is currently among the top concerns of organizations leaders. Impact of phishing attacks Among the most targeted industries by cybercriminals are healthcare, financial, manufacturing, and education institutions. An observation carried out in the fourth quarter of 2024 found that software-as-a-service (SaaS) and webmail was most likely to encounter phishing attacks. According to the reports, almost a quarter of them stated being targeted by a phishing scam in the measured period.

  20. p

    Cybersecurity Statistics in the UK for 2024

    • privacyengine.io
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PrivacyEngine (2024). Cybersecurity Statistics in the UK for 2024 [Dataset]. https://www.privacyengine.io/cybersecurity-statistics-uk-2024/
    Explore at:
    Dataset updated
    Dec 30, 2024
    Dataset authored and provided by
    PrivacyEngine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2024
    Area covered
    United Kingdom
    Description

    This study analyzes cybersecurity issues across the UK, using data from an independent sample of 850,000 individuals and organizations during the 12 months leading up to July 2024. With cybersecurity breaches affecting 50% of small businesses, 70% of medium businesses, 74% of large businesses, and up to 66% of charities, the findings highlight the increasing threat of ransomware (45%), phishing attacks (30%), and challenges related to GDPR compliance (15%). These insights provide a comprehensive view of how cybersecurity challenges impact businesses and individuals across the region.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2026). Most targeted industry sectors worldwide targeted by phishing Q4 2024 [Dataset]. https://www.statista.com/statistics/266161/websites-most-affected-by-phishing/
Organization logo

Most targeted industry sectors worldwide targeted by phishing Q4 2024

Explore at:
17 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 2, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

During the fourth quarter of 2024, nearly 23 percent of phishing attacks worldwide targeted social media. Web-based software services and webmail were targeted by over 23 percent of registered phishing attacks. Furthermore, financial institutions accounted for 12 percent of attacks.

Search
Clear search
Close search
Google apps
Main menu