100+ datasets found
  1. k

    Phishing-Dataset-for-Machine-Learning

    • kaggle.com
    Updated Nov 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Phishing-Dataset-for-Machine-Learning [Dataset]. https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2019
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identify Phishing using Machine learning Algorithms

  2. d

    Phishing Websites Dataset - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Phishing Websites Dataset - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/bd3aa720-4e0c-555a-9475-b5a36dc655ef
    Explore at:
    Dataset updated
    Oct 22, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of a collection of legitimate as well as phishing website instances. Each instance contains the URL and the relevant HTML page. The index.sql file is the root file, and it can be used to map the URLs with the relevant HTML pages. The dataset can serve as an input for the machine learning process.Highlights: - Total number of instances: 80,000 (83,275 instances in the dataset due to the existence of some removed SQL records in preprocessing stage) - Number of legitimate website instances (labelled as 0 in the SQL file): 50,000 - Number of phishing website instances (labelled as 1 in the SQL file): 30,000Structure:The index.sql file is the root file. It consisted of five fields. 1). rec_id - record number 2). url - URL of the webpage 3). website - Filename of the webpage (i.e. 1635698138155948.html) 4). result - Indicates whether a given URL is phishing or not (0 for legitimate and 1 for phishing). 5). created_date - Webpage downloaded dateSources: - Legitimate Data [50,000] - These data were collected from two sources. 1). Google search - Simple keyword search on the google search engine was used, and the top 5 URLs of each search were collected. Domain restrictions were used and limited a maximum of 10 collections from a domain to have a diverse collection at the end. 2). Ebbu2017 Phishing Dataset [1] - Nearly 25,874 active URLs were collected from this repository - Phishing Data [30,000] - Three sources were used. 1). PhishTank - From 01 December 2020 to 31 October 2021 2). OpenPhish - From 29 September 2021 to 31 October 2021 3). PhishRepo [2] - From 29 September 2021 to 31 October 2021Data Collection Process: - Legitimate Data: - The URLs were collected from the above sources and fetched the relevant webpages separately. - The URLs are in different lengths to minimize the URL lengths issue mentioned by Verma et al. [3]. - Phishing Data: - The URLs were collected from the above sources, and at the same time, the relevant web pages were fetched. - An automated script continuously monitored PhishTank and OpenPhish to collect the latest phishing URLs. - The collected URLs were fetched simultaneously to minimize the resource unavailable issue since the phishing pages do not exist for a longer period on the web. - PhishRepo provides all the resources relevant to a phishing webpage; therefore, simply use their download function to download PhishRepo data.References:[1]. Ebbu2017 Phishing Dataset. Accessed 31 October 2021. Available: https://github.com/ebubekirbbr/pdd/tree/master/input.[2]. PhishRepo. Accessed 31 October 2021. Available: https://moraphishdet.projects.uom.lk/phishrepo/.[3]. Verma, Rakesh M., Victor Zeng, and Houtan Faridi. "Data quality for security challenges: Case studies of phishing, malware and intrusion detection datasets.", 2019.

  3. Website Phishing Dataset

    • kaggle.com
    • data.world
    Updated May 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad Noor (2019). Website Phishing Dataset [Dataset]. https://www.kaggle.com/ahmednour/website-phishing-data-set/tasks
    Explore at:
    Dataset updated
    May 4, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmad Noor
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    ISSR CS602 Machine Learning - Project

    Website Phishing Data Set Download: Data Folder, Data Set Description

    Abstract:

    Data Set Characteristics : MultivariateNumber of Instances : 1353
    Attribute Characteristics : IntegerNumber of Attributes : 10
    Associated Tasks : ClassificationNumber of Web Hits : 54880

    Source: Dataset url

    Neda Abdelhamid Auckland Institute of Studies nedah '@' ais.ac.nz

    Data Set Information:

    The phishing problem is considered a vital issue in “.COM†industry especially e-banking and e-commerce taking the number of online transactions involving payments. We have identified different features related to legitimate and phishy websites and collected 1353 different websites from difference sources.Phishing websites were collected from Phishtank data archive (www.phishtank.com), which is a free community site where users can submit, verify, track and share phishing data. The legitimate websites were collected from Yahoo and starting point directories using a web script developed in PHP. The PHP script was plugged with a browser and we collected 548 legitimate websites out of 1353 websites. There is 702 phishing URLs, and 103 suspicious URLs.

    When a website is considered SUSPICIOUS that means it can be either phishy or legitimate, meaning the website held some legit and phishy features.

    Attribute Information:

    URL Anchor
    Request URL SFH URL Length
    Having ’@’
    Prefix/Suffix
    IP
    Sub Domain
    Web traffic Domain age
    Class

    collected features hold the categorical values , “Legitimate†, †Suspicious†and “Phishy†, these values have been replaced with numerical values 1,0 and -1 respectively. details of each feature are mentioned in the research paper mentioned below

  4. Phishing Websites Dataset

    • kaggle.com
    zip
    Updated Mar 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnav Samal (2024). Phishing Websites Dataset [Dataset]. https://www.kaggle.com/datasets/arnavs19/phishing-websites-dataset
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 23, 2024
    Authors
    Arnav Samal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

    Here, the two variants of the Phishing Dataset are presented.

    1. Full variant - dataset_full.csv

      • Total number of instances: 88,647
      • Number of legitimate website instances (labeled as 0): 58,000
      • Number of phishing website instances (labeled as 1): 30,647
      • Total number of features: 111
    2. Small variant - dataset_small.csv

      • Total number of instances: 58,645
      • Number of legitimate website instances (labeled as 0): 27,998
      • Number of phishing website instances (labeled as 1): 30,647
      • Total number of features: 111
  5. o

    PhishingWebsites

    • openml.org
    • data.world
    Updated Feb 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae) (2016). PhishingWebsites [Dataset]. https://www.openml.org/d/4534
    Explore at:
    Dataset updated
    Feb 16, 2016
    Authors
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae)
    Description

    Author: Rami Mustafa A Mohammad ( University of Huddersfield","rami.mohammad '@' hud.ac.uk","rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield","t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai","fadi '@' cud.ac.ae)
    Source: UCI
    Please cite: Please refer to the Machine Learning Repository's citation policy

    Source:

    Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai,fadi '@' cud.ac.ae)

    Data Set Information:

    One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

    Attribute Information:

    For Further information about the features see the features file in the data folder of UCI.

    Relevant Papers:

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi (2012) An Assessment of Features Related to Phishing Websites using an Automated Technique. In: International Conferece For Internet Technology And Secured Transactions. ICITST 2012 . IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0

    Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2014) Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25 (2). pp. 443-458. ISSN 0941-0643

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi Abdeljaber (2014) Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3). pp. 153-160. ISSN 1751-8709

    Citation Request:

    Please refer to the Machine Learning Repository's citation policy

  6. Ethereum Phishing Transaction Network

    • kaggle.com
    zip
    Updated Mar 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XBlock (2020). Ethereum Phishing Transaction Network [Dataset]. https://www.kaggle.com/datasets/xblock/ethereum-phishing-transaction-network
    Explore at:
    zip(410821864 bytes)Available download formats
    Dataset updated
    Mar 23, 2020
    Authors
    XBlock
    Description

    Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem.

    In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.

    MulDiGraph.pkl:This dataset is stored in pickle format, and it is the networkx object. Each node is an address with an attribute called isp indicating whether it is a phishing node. Each edge has two attributes, including amount and timestamp, which represent the balance of the transaction and the timestamp of the transaction, respectively. In this data set, the total number of nodes is 2,973,489, the number of transactions is 13,551,303, and the average degree is 4.5574.

    For more details about blockchain dataset, please click here.

  7. h

    phishing-dataset

    • huggingface.co
    Updated Feb 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esteban Alvarado (2024). phishing-dataset [Dataset]. https://huggingface.co/datasets/ealvaradob/phishing-dataset
    Explore at:
    Dataset updated
    Feb 18, 2024
    Authors
    Esteban Alvarado
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset designed for phishing classification tasks in various data types.

  8. Phishing website dataset

    • kaggle.com
    zip
    Updated Jan 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Kumar (2018). Phishing website dataset [Dataset]. https://www.kaggle.com/akashkr/phishing-website-dataset
    Explore at:
    zip(112393 bytes)Available download formats
    Dataset updated
    Jan 22, 2018
    Authors
    Akash Kumar
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Akash Kumar

    Released under CC0: Public Domain

    Contents

  9. E

    Phishing Statistics By Types, Country and Age Group

    • enterpriseappstoday.com
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Phishing Statistics By Types, Country and Age Group [Dataset]. https://www.enterpriseappstoday.com/stats/phishing-statistics.html
    Explore at:
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Phishing Statistics: Phishing is a popular trick used by online criminals. They send harmful messages through emails, texts, and even phone calls. The idea is to make you click a bad link or download harmful software. It's the top cyber crime, affecting 83% of UK businesses that had a cyber attack in 2022. In 2021, about 323,972 people worldwide were tricked by phishing. This made up half of all victims of cyber crimes, despite Google blocking almost all phishing attempts. Each phishing attack cost the victim around $136 on average, leading to a whopping $44.2 million stolen by criminals in 2021. Most phishing happens through emails. For every 100 internet users, about 16.5 had their emails exposed in 2021. These stolen emails are sold on the internet's black market, where criminals buy them to use in their attacks. With 1 billion emails leaked, phishing remains a big threat. It's crucial for businesses, especially those in sensitive industries like finance and law, to protect themselves. A 2019 study found that spear phishing, a targeted form of phishing, was the main attack method for 65% of criminal groups, used mostly for collecting information. In 2022, the most common phishing emails included links to '.com' websites, making up 54% of the total. The next common was '.net', at just 8.9%. Top '.com' domain names involved were Adobe, Google, Myportfolio, Backblazeb2, and Weebly. Phishing can cause massive damage. For instance, a data breach affecting 10 million records can cost a business $50 million. If it impacts 50 million records, the cost could rise to $392 million. These days, as people struggle with high living costs, scammers are taking advantage. In the UK, they pretended to be Ofgem, the energy regulator, to get personal financial details. Ofgem responded by asking energy companies to warn customers about these scams on their websites.

    Editor’s Choice

    • In 2022, phishing attacks doubled from the previous year, with more than 500 million incidents recorded.
    • Email phishing scams in the U.S. saw a steep rise of 48% in 2022.
    • Young adults, particularly Gen-Z and Millennials, were the primary victims of phishing attacks.
    • By 2023, Nevada was the most affected U.S. state by phishing, while Kansas experienced the least phishing attacks.
    • The District of Columbia saw 25 phishing victims per 10,000 residents, leading to a financial loss of $25,562.
    • Arkansas suffered the highest financial loss due to phishing, with more than $80,000 lost per 10,000 residents.
    • Phishing scams in Delaware spiked by 71% in 2022.
    • Wisconsin recorded the largest number of phishing victims in the past two years, with a 38% increase.
    • The U.S. Internet Crime Complaint Center (IC3) received 300,497 reports from phishing victims.
    • Businesses in the U.S. faced over $2.7 billion in losses from email scams by the end of 2022.
    • According to IC3, financial losses from phishing surpassed $10.3 billion in 2022, with 800,944 reports in the U.S.
    • In an effort to decrease phishing, 84% of U.S. organizations started regular security awareness training for employees in 2023, significantly reducing phishing incidents.
    • Phishing remains the top cybercrime, with a daily estimate of 3.4 billion spam emails.
    • Stolen credentials are the leading cause of data breaches.
    • Google manages to block about 100 million phishing emails each day.
    • Almost half of all emails sent in 2022, 48%, were spam.
    • Russia is responsible for more than a fifth of all phishing emails.
    • Millennials and Gen-Z internet users are most likely to fall for phishing scams.
    • In the UK, 83% of businesses that experienced a cyber attack in 2022 identified phishing as the attack method.
    • Asian organizations reported phishing as the most common form of cyber attack in 2021.
    • A data breach can cost an organization more than $4 million on average.
    • A single whaling attack, a type of targeted phishing, can cost a business as much as $47 million.
  10. P

    LLM Generated Spear Phishing Emails Dataset

    • paperswithcode.com
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Nahmias; Gal Engelberg; Dan Klein; Asaf Shabtai (2024). LLM Generated Spear Phishing Emails Dataset [Dataset]. https://paperswithcode.com/dataset/llm-generated-spear-phishing-emails
    Explore at:
    Dataset updated
    Feb 12, 2024
    Authors
    Daniel Nahmias; Gal Engelberg; Dan Klein; Asaf Shabtai
    Description

    This dataset comprises high-quality, targeted spear-phishing emails created using a proprietary system that harnesses the power of LLMs and knowledge graphs. The primary purpose of releasing this dataset is to promote and facilitate further research in the field of spear-phishing detection.

    We anticipate that LLM-generated spear-phishing attacks will soon gain prominence and potentially surpass traditional phishing campaigns, which current detection solutions are designed to identify.

  11. Failure rates for phishing simulations in companies worldwide 2021-2022, by...

    • statista.com
    • stelinmart.com
    • +1more
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Failure rates for phishing simulations in companies worldwide 2021-2022, by industry [Dataset]. https://www.statista.com/topics/8385/phishing/
    Explore at:
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    A 2022 survey of working adults and IT security professionals worldwide found that electronics manufacturers showed the highest failure rate for phishing attack simulations, 14 percent. The aerospace and mining companies followed, with a 13 percent failure rate. Legal companies showed the lowest failure rate, down from 11 percent in 2021.

  12. K

    Phishing website Detector

    • gutcredit.com
    • sasender.com
    • +2more
    zip
    Updated Feb 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eswar Chand (2020). Phishing website Detector [Dataset]. https://gutcredit.com/problem-statement-for-phishing-detection
    Explore at:
    zip(201800 bytes)Available download formats
    Dataset updated
    Feb 28, 2020
    Authors
    Eswar Chand
    Description

    Description

    The data set is provided both in text file furthermore csv file which provides the following resources that can be used when enter to model building :

    1. A getting of website URLs on 11000+ websites. Each example can 30 website parameters and adenine class tag identifying computer as a phishing website or not (1 or -1).

    2. The code template features these encrypt blocks: a. Import modules (Part 1) b. Load details function + input/output zone descriptions

    The input set also serves as an input for task scoping and tries to specify aforementioned functional and non-functional requirements for it.

    Background to Problem Statement :

    You are expected to write who code for a binary classification model (phishing home or not) using Python Scikit-Learn that trains on the date and calculates an accuracy score off the test data. You will to used one either better of the classification algorithms the train a model in aforementioned phishing website your set.

    Dataset Description:

    1. Of dataset for a “.txt” file is with don headers and has only the column values.
    2. The actually column-wise header is described above and, if needed, you can add to overhead manually if to are using '.txt' file.If you are employing '.csv' file then the column your were further and given.
    3. Aforementioned header list (column names) is as follows : [ 'UsingIP', 'LongURL', 'ShortURL', 'Symbol@', 'Redirecting//', 'PrefixSuffix-', 'SubDomains', 'HTTPS', 'DomainRegLen', 'Favicon', 'NonStdPort', 'HTTPSDomainURL', 'RequestURL', 'AnchorURL', 'LinksInScriptTags', 'ServerFormHandler', 'InfoEmail', 'AbnormalURL', 'WebsiteForwarding', 'StatusBarCust', 'DisableRightClick', 'UsingPopupWindow', 'IframeRedirection', 'AgeofDomain', 'DNSRecording', 'WebsiteTraffic', 'PageRank', 'GoogleIndex', 'LinksPointingToPage', 'StatsReport', 'class' ] ### Brief Account by the features in data set ● UsingIP (categorical - signed numeric) : { -1,1 } ● LongURL (categorical - signed numeric) : { 1,0,-1 } ● ShortURL (categorical - signed numeric) : { 1,-1 } ● Symbol@ (categorical - signed numeric) : { 1,-1 } ● Redirecting// (categorical - signed numeric) : { -1,1 } ● PrefixSuffix- (categorical - signed numeric) : { -1,1 } ● SubDomains (categorical - signed numeric) : { -1,0,1 } ● HTTPS (categorical - sealed numeric) : { -1,1,0 } ● DomainRegLen (categorical - signed numeric) : { -1,1 } ● Favicon (categorical - gestural numeric) : { 1,-1 } ● NonStdPort (categorical - signed numeric) : { 1,-1 } ● HTTPSDomainURL (categorical - signed numeric) : { -1,1 } ● RequestURL (categorical - audience numeric) : { 1,-1 } ● AnchorURL (categorical - drawn numeric) :
  13. A

    ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-phishing-dataset-for-machine-learning-9439/53570f2e/?iid=130-479&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 29 August 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.

    Content

    This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.

    Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.

    Acknowledgements

    Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.

    --- Original source retains full ownership of the source dataset ---

  14. Phishing_Dataset

    • kaggle.com
    zip
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Me_Rahul_K (2023). Phishing_Dataset [Dataset]. https://www.kaggle.com/datasets/merahulk/phishing-dataset
    Explore at:
    zip(11581972 bytes)Available download formats
    Dataset updated
    May 10, 2023
    Authors
    Me_Rahul_K
    Description

    Dataset

    This dataset was created by Me_Rahul_K

    Contents

  15. Outcomes of successful phishing attacks in companies worldwide 2021-2023

    • statista.com
    Updated Sep 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2021). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/study/102216/phishing/
    Explore at:
    Dataset updated
    Sep 1, 2021
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.

  16. Phishing and Benign Websites

    • kaggle.com
    Updated Dec 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peya Mowar (2021). Phishing and Benign Websites [Dataset]. https://www.kaggle.com/peyamowar/phishing-and-benign-websites
    Explore at:
    Dataset updated
    Dec 28, 2021
    Dataset provided by
    Kaggle
    Authors
    Peya Mowar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    Phishing is a cybercrime in which deceitful websites lure naive users and trick them into disclosing confidential information, such as social media passwords or financial data. This phishing dataset can be used for training supervised or semi-supervised phishing detection models.

    Content

    The dataset contains 38,800 URLs that have been classified as either phishing or benign.

    Citation

    Mowar, Peya, & Jain, Mini. (2021, December 28). Phishing and Benign Websites Dataset. 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) (CyberSA), Dublin, Ireland. https://doi.org/10.5281/zenodo.5807622

  17. m

    Phishing Dataset for Machine Learning: Feature Evaluation

    • data.mendeley.com
    Updated Mar 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Choon Lin Tan (2018). Phishing Dataset for Machine Learning: Feature Evaluation [Dataset]. http://doi.org/10.17632/h3cgnj8hft.1
    Explore at:
    Dataset updated
    Mar 24, 2018
    Authors
    Choon Lin Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to parsing approach based on regular expressions. This dataset is WEKA-ready.

    Phishing webpage source: PhishTank, OpenPhish Legitimate webpage source: Alexa, Common Crawl

    Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.

  18. i

    Phishing Website Data Set

    • impactcybertrust.org
    Updated Mar 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2015). Phishing Website Data Set [Dataset]. http://doi.org/10.23721/100/1478806
    Explore at:
    Dataset updated
    Mar 26, 2015
    Authors
    External Data Source
    Description

    Although many articles about predicting phishing websites have been disseminated, no reliable training dataset has been previously published publically, maybe because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Googles searching operators.
    Data Set Characteristics: N/A
    Number of Instances:2456
    Area:Computer Security
    Attribute Characteristics:Integer
    Number of Attributes:30
    Date Donated 2015-03-26
    Associated Tasks: Classification
    Missing Values? N/A
    ; ml-repository@ics.uci.edu

  19. H

    Evaluating the cognitive mechanisms of phishing detection with PEST, an...

    • dataverse.harvard.edu
    text/x-fixed-field
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). Evaluating the cognitive mechanisms of phishing detection with PEST, an ecologically valid lab-based measure of phishing susceptibility [Dataset]. http://doi.org/10.7910/DVN/DB56VY
    Explore at:
    text/x-fixed-field(9068), text/x-fixed-field(9047), text/x-fixed-field(8844), text/x-fixed-field(8872), text/x-fixed-field(8539), text/x-fixed-field(8967), text/x-fixed-field(9008), text/x-fixed-field(9069), text/x-fixed-field(9059), text/x-fixed-field(9076), text/x-fixed-field(9040), text/x-fixed-field(9073), text/x-fixed-field(8856), text/x-fixed-field(9000), text/x-fixed-field(8881), text/x-fixed-field(9111), text/x-fixed-field(9023), text/x-fixed-field(9049), text/x-fixed-field(8849), text/x-fixed-field(9030), text/x-fixed-field(8958), text/x-fixed-field(8981), text/x-fixed-field(8970), text/x-fixed-field(8893), text/x-fixed-field(9080), text/x-fixed-field(8956), text/x-fixed-field(9124), text/x-fixed-field(8879), text/x-fixed-field(8997), text/x-fixed-field(8870), text/x-fixed-field(9112), text/x-fixed-field(8785), text/x-fixed-field(5339), text/x-fixed-field(9065), text/x-fixed-field(9098), text/x-fixed-field(8992), text/x-fixed-field(9038), text/x-fixed-field(9086), text/x-fixed-field(9004), text/x-fixed-field(8966), text/x-fixed-field(9097), text/x-fixed-field(9079), text/x-fixed-field(8827), text/x-fixed-field(9117), text/x-fixed-field(9053), text/x-fixed-field(9066), text/x-fixed-field(9155), text/x-fixed-field(8871), text/x-fixed-field(8989), text/x-fixed-field(8946), text/x-fixed-field(8877), text/x-fixed-field(9033), text/x-fixed-field(8354), text/x-fixed-field(8656), text/x-fixed-field(9067), text/x-fixed-field(9052), text/x-fixed-field(8921), text/x-fixed-field(8944), text/x-fixed-field(8938), text/x-fixed-field(9014), text/x-fixed-field(9051), text/x-fixed-field(6889), text/x-fixed-field(8362), text/x-fixed-field(9089), text/x-fixed-field(8896), text/x-fixed-field(8874), text/x-fixed-field(9060), text/x-fixed-field(8963), text/x-fixed-field(9032), text/x-fixed-field(9166), text/x-fixed-field(8876), text/x-fixed-field(9063), text/x-fixed-field(8926), text/x-fixed-field(8688), text/x-fixed-field(9025), text/x-fixed-field(8868), text/x-fixed-field(8612), text/x-fixed-field(8974), text/x-fixed-field(9247), text/x-fixed-field(8934), text/x-fixed-field(8883), text/x-fixed-field(8880), text/x-fixed-field(3158), text/x-fixed-field(9012), text/x-fixed-field(8955)Available download formats
    Dataset updated
    Jul 20, 2020
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data and code to generate Figures from Hakim et al. Evaluating the cognitive mechanisms of phishing detection with PEST, an ecologically valid lab-based measure of phishing susceptibility NOTE: Figure 4 requires data from the original PHIT task. These are available online at XXX Data files are csv files. Naming has the following form: scamdata_SUBJECTNUMBER_DATETIME_AGE_GENDER.dat e.g. scamdata_1_10Oct2018090103_18_F.dat Each datafile has 7 columns : userId : subject response (1 - safe with high confidence, 2 - safe with low confidence, 3 - scam with low confidence, 4 - scam with high confidence) reactTime : reaction time in seconds category : PHIT Email Category (and custom categories for pooled scam/safe emails) type : weapon of influence (for PHIT emails only) hasAtt : binary indicating whether email has an attachment realID : real email identifier (scam or safe) emailCode : unique ID of each email - used to locate specific emails within excel files

  20. m

    Web page phishing detection

    • data.mendeley.com
    Updated Jun 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
    Explore at:
    Dataset updated
    Jun 25, 2021
    Authors
    Abdelhakim Hannousse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

    dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

    dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2019). Phishing-Dataset-for-Machine-Learning [Dataset]. https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning

Phishing-Dataset-for-Machine-Learning

Explore at:
74 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 5, 2019
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identify Phishing using Machine learning Algorithms

Search
Clear search
Close search
Google apps
Main menu