100+ datasets found

k
Phishing-Dataset-for-Machine-Learning
kaggle.com
Updated Nov 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Phishing-Dataset-for-Machine-Learning [Dataset]. https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 5, 2019
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identify Phishing using Machine learning Algorithms
d
Phishing Websites Dataset - Dataset - B2FIND
b2find.dkrz.de
Updated Oct 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Phishing Websites Dataset - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/bd3aa720-4e0c-555a-9475-b5a36dc655ef
Explore at:
Dataset updated
Oct 22, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of a collection of legitimate as well as phishing website instances. Each instance contains the URL and the relevant HTML page. The index.sql file is the root file, and it can be used to map the URLs with the relevant HTML pages. The dataset can serve as an input for the machine learning process.Highlights: - Total number of instances: 80,000 (83,275 instances in the dataset due to the existence of some removed SQL records in preprocessing stage) - Number of legitimate website instances (labelled as 0 in the SQL file): 50,000 - Number of phishing website instances (labelled as 1 in the SQL file): 30,000Structure:The index.sql file is the root file. It consisted of five fields. 1). rec_id - record number 2). url - URL of the webpage 3). website - Filename of the webpage (i.e. 1635698138155948.html) 4). result - Indicates whether a given URL is phishing or not (0 for legitimate and 1 for phishing). 5). created_date - Webpage downloaded dateSources: - Legitimate Data [50,000] - These data were collected from two sources. 1). Google search - Simple keyword search on the google search engine was used, and the top 5 URLs of each search were collected. Domain restrictions were used and limited a maximum of 10 collections from a domain to have a diverse collection at the end. 2). Ebbu2017 Phishing Dataset [1] - Nearly 25,874 active URLs were collected from this repository - Phishing Data [30,000] - Three sources were used. 1). PhishTank - From 01 December 2020 to 31 October 2021 2). OpenPhish - From 29 September 2021 to 31 October 2021 3). PhishRepo [2] - From 29 September 2021 to 31 October 2021Data Collection Process: - Legitimate Data: - The URLs were collected from the above sources and fetched the relevant webpages separately. - The URLs are in different lengths to minimize the URL lengths issue mentioned by Verma et al. [3]. - Phishing Data: - The URLs were collected from the above sources, and at the same time, the relevant web pages were fetched. - An automated script continuously monitored PhishTank and OpenPhish to collect the latest phishing URLs. - The collected URLs were fetched simultaneously to minimize the resource unavailable issue since the phishing pages do not exist for a longer period on the web. - PhishRepo provides all the resources relevant to a phishing webpage; therefore, simply use their download function to download PhishRepo data.References:[1]. Ebbu2017 Phishing Dataset. Accessed 31 October 2021. Available: https://github.com/ebubekirbbr/pdd/tree/master/input.[2]. PhishRepo. Accessed 31 October 2021. Available: https://moraphishdet.projects.uom.lk/phishrepo/.[3]. Verma, Rakesh M., Victor Zeng, and Houtan Faridi. "Data quality for security challenges: Case studies of phishing, malware and intrusion detection datasets.", 2019.
Website Phishing Dataset
kaggle.com
data.world
Updated May 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad Noor (2019). Website Phishing Dataset [Dataset]. https://www.kaggle.com/ahmednour/website-phishing-data-set/tasks
Explore at:
Dataset updated
May 4, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmad Noor
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
ISSR CS602 Machine Learning - Project

Website Phishing Data Set Download: Data Folder, Data Set Description

Abstract:

Data Set Characteristics : Multivariate Number of Instances : 1353
Attribute Characteristics : Integer Number of Attributes : 10
Associated Tasks : Classification Number of Web Hits : 54880

Source: Dataset url

Neda Abdelhamid Auckland Institute of Studies nedah '@' ais.ac.nz

Data Set Information:

The phishing problem is considered a vital issue in â€œ.COMâ€ industry especially e-banking and e-commerce taking the number of online transactions involving payments. We have identified different features related to legitimate and phishy websites and collected 1353 different websites from difference sources.Phishing websites were collected from Phishtank data archive (www.phishtank.com), which is a free community site where users can submit, verify, track and share phishing data. The legitimate websites were collected from Yahoo and starting point directories using a web script developed in PHP. The PHP script was plugged with a browser and we collected 548 legitimate websites out of 1353 websites. There is 702 phishing URLs, and 103 suspicious URLs.

When a website is considered SUSPICIOUS that means it can be either phishy or legitimate, meaning the website held some legit and phishy features.

Attribute Information:

URL Anchor
Request URL SFH URL Length
Having â€™@â€™
Prefix/Suffix
IP
Sub Domain
Web traffic Domain age
Class

collected features hold the categorical values , â€œLegitimateâ€ , â€ Suspiciousâ€ and â€œPhishyâ€ , these values have been replaced with numerical values 1,0 and -1 respectively. details of each feature are mentioned in the research paper mentioned below
Phishing Websites Dataset
kaggle.com
zip
Updated Mar 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnav Samal (2024). Phishing Websites Dataset [Dataset]. https://www.kaggle.com/datasets/arnavs19/phishing-websites-dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 23, 2024
Authors
Arnav Samal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

Here, the two variants of the Phishing Dataset are presented.

Full variant - dataset_full.csv

Total number of instances: 88,647

Number of legitimate website instances (labeled as 0): 58,000

Number of phishing website instances (labeled as 1): 30,647

Total number of features: 111

Small variant - dataset_small.csv

Total number of instances: 58,645

Number of legitimate website instances (labeled as 0): 27,998

Number of phishing website instances (labeled as 1): 30,647

Total number of features: 111
o
PhishingWebsites
openml.org
data.world
Updated Feb 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae) (2016). PhishingWebsites [Dataset]. https://www.openml.org/d/4534
Explore at:
Dataset updated
Feb 16, 2016
Authors
Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae)
Description
Author: Rami Mustafa A Mohammad ( University of Huddersfield","rami.mohammad '@' hud.ac.uk","rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield","t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai","fadi '@' cud.ac.ae)
Source: UCI
Please cite: Please refer to the Machine Learning Repository's citation policy

Source:

Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai,fadi '@' cud.ac.ae)

Data Set Information:

One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

Attribute Information:

For Further information about the features see the features file in the data folder of UCI.

Relevant Papers:

Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi (2012) An Assessment of Features Related to Phishing Websites using an Automated Technique. In: International Conferece For Internet Technology And Secured Transactions. ICITST 2012 . IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0

Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2014) Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25 (2). pp. 443-458. ISSN 0941-0643

Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi Abdeljaber (2014) Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3). pp. 153-160. ISSN 1751-8709

Citation Request:

Please refer to the Machine Learning Repository's citation policy
Ethereum Phishing Transaction Network
kaggle.com
zip
Updated Mar 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XBlock (2020). Ethereum Phishing Transaction Network [Dataset]. https://www.kaggle.com/datasets/xblock/ethereum-phishing-transaction-network
Explore at:
zip(410821864 bytes)Available download formats
Dataset updated
Mar 23, 2020
Authors
XBlock
Description
Cryptocurrency, as blockchain’s most famous implementation, suffers a huge economic loss due to phishing scams. In our work, accounts and transactions in Ethereum are treated as nodes and edges, thus detection of phishing accounts can be modeled as a node classification problem.

In this work, we collected phishing nodes from Ethereum that reported in Etherscan labeled cloud. Starting from phishing nodes we crawl a huge Ethereum transaction network via second-order BFS. Dataset contains 2,973,489 nodes, 13,551,303 edges and 1,165 labeled nodes.

MulDiGraph.pkl：This dataset is stored in pickle format, and it is the networkx object. Each node is an address with an attribute called isp indicating whether it is a phishing node. Each edge has two attributes, including amount and timestamp, which represent the balance of the transaction and the timestamp of the transaction, respectively. In this data set, the total number of nodes is 2,973,489, the number of transactions is 13,551,303, and the average degree is 4.5574.

For more details about blockchain dataset, please click here.
h
phishing-dataset
huggingface.co
Updated Feb 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esteban Alvarado (2024). phishing-dataset [Dataset]. https://huggingface.co/datasets/ealvaradob/phishing-dataset
Explore at:
Dataset updated
Feb 18, 2024
Authors
Esteban Alvarado
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset designed for phishing classification tasks in various data types.
Phishing website dataset
kaggle.com
zip
Updated Jan 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Kumar (2018). Phishing website dataset [Dataset]. https://www.kaggle.com/akashkr/phishing-website-dataset
Explore at:
zip(112393 bytes)Available download formats
Dataset updated
Jan 22, 2018
Authors
Akash Kumar
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset

This dataset was created by Akash Kumar

Released under CC0: Public Domain

Contents
E
Phishing Statistics By Types, Country and Age Group
enterpriseappstoday.com
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EnterpriseAppsToday (2023). Phishing Statistics By Types, Country and Age Group [Dataset]. https://www.enterpriseappstoday.com/stats/phishing-statistics.html
Explore at:
Dataset updated
Aug 4, 2023
Dataset authored and provided by
EnterpriseAppsToday
License
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction
Phishing Statistics: Phishing is a popular trick used by online criminals. They send harmful messages through emails, texts, and even phone calls. The idea is to make you click a bad link or download harmful software. It's the top cyber crime, affecting 83% of UK businesses that had a cyber attack in 2022. In 2021, about 323,972 people worldwide were tricked by phishing. This made up half of all victims of cyber crimes, despite Google blocking almost all phishing attempts. Each phishing attack cost the victim around $136 on average, leading to a whopping $44.2 million stolen by criminals in 2021. Most phishing happens through emails. For every 100 internet users, about 16.5 had their emails exposed in 2021. These stolen emails are sold on the internet's black market, where criminals buy them to use in their attacks. With 1 billion emails leaked, phishing remains a big threat. It's crucial for businesses, especially those in sensitive industries like finance and law, to protect themselves. A 2019 study found that spear phishing, a targeted form of phishing, was the main attack method for 65% of criminal groups, used mostly for collecting information. In 2022, the most common phishing emails included links to '.com' websites, making up 54% of the total. The next common was '.net', at just 8.9%. Top '.com' domain names involved were Adobe, Google, Myportfolio, Backblazeb2, and Weebly. Phishing can cause massive damage. For instance, a data breach affecting 10 million records can cost a business $50 million. If it impacts 50 million records, the cost could rise to $392 million. These days, as people struggle with high living costs, scammers are taking advantage. In the UK, they pretended to be Ofgem, the energy regulator, to get personal financial details. Ofgem responded by asking energy companies to warn customers about these scams on their websites.
Editorâ€™s Choice

In 2022, phishing attacks doubled from the previous year, with more than 500 million incidents recorded.

Email phishing scams in the U.S. saw a steep rise of 48% in 2022.

Young adults, particularly Gen-Z and Millennials, were the primary victims of phishing attacks.

By 2023, Nevada was the most affected U.S. state by phishing, while Kansas experienced the least phishing attacks.

The District of Columbia saw 25 phishing victims per 10,000 residents, leading to a financial loss of $25,562.

Arkansas suffered the highest financial loss due to phishing, with more than $80,000 lost per 10,000 residents.

Phishing scams in Delaware spiked by 71% in 2022.

Wisconsin recorded the largest number of phishing victims in the past two years, with a 38% increase.

The U.S. Internet Crime Complaint Center (IC3) received 300,497 reports from phishing victims.

Businesses in the U.S. faced over $2.7 billion in losses from email scams by the end of 2022.

According to IC3, financial losses from phishing surpassed $10.3 billion in 2022, with 800,944 reports in the U.S.

In an effort to decrease phishing, 84% of U.S. organizations started regular security awareness training for employees in 2023, significantly reducing phishing incidents.

Phishing remains the top cybercrime, with a daily estimate of 3.4 billion spam emails.

Stolen credentials are the leading cause of data breaches.

Google manages to block about 100 million phishing emails each day.

Almost half of all emails sent in 2022, 48%, were spam.

Russia is responsible for more than a fifth of all phishing emails.

Millennials and Gen-Z internet users are most likely to fall for phishing scams.

In the UK, 83% of businesses that experienced a cyber attack in 2022 identified phishing as the attack method.

Asian organizations reported phishing as the most common form of cyber attack in 2021.

A data breach can cost an organization more than $4 million on average.

A single whaling attack, a type of targeted phishing, can cost a business as much as $47 million.
P
LLM Generated Spear Phishing Emails Dataset
paperswithcode.com
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Nahmias; Gal Engelberg; Dan Klein; Asaf Shabtai (2024). LLM Generated Spear Phishing Emails Dataset [Dataset]. https://paperswithcode.com/dataset/llm-generated-spear-phishing-emails
Explore at:
Dataset updated
Feb 12, 2024
Authors
Daniel Nahmias; Gal Engelberg; Dan Klein; Asaf Shabtai
Description
This dataset comprises high-quality, targeted spear-phishing emails created using a proprietary system that harnesses the power of LLMs and knowledge graphs. The primary purpose of releasing this dataset is to promote and facilitate further research in the field of spear-phishing detection.

We anticipate that LLM-generated spear-phishing attacks will soon gain prominence and potentially surpass traditional phishing campaigns, which current detection solutions are designed to identify.
Failure rates for phishing simulations in companies worldwide 2021-2022, by...
statista.com
stelinmart.com
+1more
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Failure rates for phishing simulations in companies worldwide 2021-2022, by industry [Dataset]. https://www.statista.com/topics/8385/phishing/
Explore at:
Dataset updated
Jan 10, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
A 2022 survey of working adults and IT security professionals worldwide found that electronics manufacturers showed the highest failure rate for phishing attack simulations, 14 percent. The aerospace and mining companies followed, with a 13 percent failure rate. Legal companies showed the lowest failure rate, down from 11 percent in 2021.
K
Phishing website Detector
gutcredit.com
sasender.com
+2more
zip
Updated Feb 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eswar Chand (2020). Phishing website Detector [Dataset]. https://gutcredit.com/problem-statement-for-phishing-detection
Explore at:
zip(201800 bytes)Available download formats
Dataset updated
Feb 28, 2020
Authors
Eswar Chand
Description
Description

The data set is provided both in text file furthermore csv file which provides the following resources that can be used when enter to model building :

A getting of website URLs on 11000+ websites. Each example can 30 website parameters and adenine class tag identifying computer as a phishing website or not (1 or -1).

The code template features these encrypt blocks: a. Import modules (Part 1) b. Load details function + input/output zone descriptions

The input set also serves as an input for task scoping and tries to specify aforementioned functional and non-functional requirements for it.

Background to Problem Statement :

You are expected to write who code for a binary classification model (phishing home or not) using Python Scikit-Learn that trains on the date and calculates an accuracy score off the test data. You will to used one either better of the classification algorithms the train a model in aforementioned phishing website your set.

Dataset Description:

Of dataset for a “.txt” file is with don headers and has only the column values.

The actually column-wise header is described above and, if needed, you can add to overhead manually if to are using '.txt' file.If you are employing '.csv' file then the column your were further and given.

Aforementioned header list (column names) is as follows : [ 'UsingIP', 'LongURL', 'ShortURL', 'Symbol@', 'Redirecting//', 'PrefixSuffix-', 'SubDomains', 'HTTPS', 'DomainRegLen', 'Favicon', 'NonStdPort', 'HTTPSDomainURL', 'RequestURL', 'AnchorURL', 'LinksInScriptTags', 'ServerFormHandler', 'InfoEmail', 'AbnormalURL', 'WebsiteForwarding', 'StatusBarCust', 'DisableRightClick', 'UsingPopupWindow', 'IframeRedirection', 'AgeofDomain', 'DNSRecording', 'WebsiteTraffic', 'PageRank', 'GoogleIndex', 'LinksPointingToPage', 'StatsReport', 'class' ] ### Brief Account by the features in data set ● UsingIP (categorical - signed numeric) : { -1,1 } ● LongURL (categorical - signed numeric) : { 1,0,-1 } ● ShortURL (categorical - signed numeric) : { 1,-1 } ● Symbol@ (categorical - signed numeric) : { 1,-1 } ● Redirecting// (categorical - signed numeric) : { -1,1 } ● PrefixSuffix- (categorical - signed numeric) : { -1,1 } ● SubDomains (categorical - signed numeric) : { -1,0,1 } ● HTTPS (categorical - sealed numeric) : { -1,1,0 } ● DomainRegLen (categorical - signed numeric) : { -1,1 } ● Favicon (categorical - gestural numeric) : { 1,-1 } ● NonStdPort (categorical - signed numeric) : { 1,-1 } ● HTTPSDomainURL (categorical - signed numeric) : { -1,1 } ● RequestURL (categorical - audience numeric) : { 1,-1 } ● AnchorURL (categorical - drawn numeric) :
A
‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-phishing-dataset-for-machine-learning-9439/53570f2e/?iid=130-479&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 29 August 2021.

--- Dataset description provided by original source is as follows ---

Context

Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.

Content

This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.

Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.

Acknowledgements

Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.

--- Original source retains full ownership of the source dataset ---
Phishing_Dataset
kaggle.com
zip
Updated May 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Me_Rahul_K (2023). Phishing_Dataset [Dataset]. https://www.kaggle.com/datasets/merahulk/phishing-dataset
Explore at:
zip(11581972 bytes)Available download formats
Dataset updated
May 10, 2023
Authors
Me_Rahul_K
Description
Dataset

This dataset was created by Me_Rahul_K

Contents
Outcomes of successful phishing attacks in companies worldwide 2021-2023
statista.com
Updated Sep 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2021). Outcomes of successful phishing attacks in companies worldwide 2021-2023 [Dataset]. https://www.statista.com/study/102216/phishing/
Explore at:
Dataset updated
Sep 1, 2021
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
Surveys of working adults and IT security professionals worldwide conducted in 2021 and 2023 found that the share of organizations experiencing severe consequences due to a successful cyber attack had declined. In 2023, the share of enterprises experiencing a breach of customer or client data was 29 percent, down from 44 percent in 2022. Ransomware infections that occurred through e-mail were common for 32 percent of the respondents in 2023. Cases of a credential or account compromise occurred in 27 percent of the organizations in 2023, a decrease of 25 percent compared to the year prior.
Phishing and Benign Websites
kaggle.com
Updated Dec 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peya Mowar (2021). Phishing and Benign Websites [Dataset]. https://www.kaggle.com/peyamowar/phishing-and-benign-websites
Explore at:
Dataset updated
Dec 28, 2021
Dataset provided by
Kaggle
Authors
Peya Mowar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

Phishing is a cybercrime in which deceitful websites lure naive users and trick them into disclosing confidential information, such as social media passwords or financial data. This phishing dataset can be used for training supervised or semi-supervised phishing detection models.

Content

The dataset contains 38,800 URLs that have been classified as either phishing or benign.

Citation

Mowar, Peya, & Jain, Mini. (2021, December 28). Phishing and Benign Websites Dataset. 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) (CyberSA), Dublin, Ireland. https://doi.org/10.5281/zenodo.5807622
m
Phishing Dataset for Machine Learning: Feature Evaluation
data.mendeley.com
Updated Mar 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Choon Lin Tan (2018). Phishing Dataset for Machine Learning: Feature Evaluation [Dataset]. http://doi.org/10.17632/h3cgnj8hft.1
Explore at:
Unique identifier
https://doi.org/10.17632/h3cgnj8hft.1
Dataset updated
Mar 24, 2018
Authors
Choon Lin Tan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to parsing approach based on regular expressions. This dataset is WEKA-ready.

Phishing webpage source: PhishTank, OpenPhish Legitimate webpage source: Alexa, Common Crawl

Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.
i
Phishing Website Data Set
impactcybertrust.org
Updated Mar 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2015). Phishing Website Data Set [Dataset]. http://doi.org/10.23721/100/1478806
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478806
Dataset updated
Mar 26, 2015
Authors
External Data Source
Description
Although many articles about predicting phishing websites have been disseminated, no reliable training dataset has been previously published publically, maybe because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Googles searching operators.
Data Set Characteristics: N/A
Number of Instances:2456
Area:Computer Security
Attribute Characteristics:Integer
Number of Attributes:30
Date Donated 2015-03-26
Associated Tasks: Classification
Missing Values? N/A
; ml-repository@ics.uci.edu
H
Evaluating the cognitive mechanisms of phishing detection with PEST, an...
dataverse.harvard.edu
text/x-fixed-field
Updated Jul 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2020). Evaluating the cognitive mechanisms of phishing detection with PEST, an ecologically valid lab-based measure of phishing susceptibility [Dataset]. http://doi.org/10.7910/DVN/DB56VY
Explore at:
text/x-fixed-field(9068), text/x-fixed-field(9047), text/x-fixed-field(8844), text/x-fixed-field(8872), text/x-fixed-field(8539), text/x-fixed-field(8967), text/x-fixed-field(9008), text/x-fixed-field(9069), text/x-fixed-field(9059), text/x-fixed-field(9076), text/x-fixed-field(9040), text/x-fixed-field(9073), text/x-fixed-field(8856), text/x-fixed-field(9000), text/x-fixed-field(8881), text/x-fixed-field(9111), text/x-fixed-field(9023), text/x-fixed-field(9049), text/x-fixed-field(8849), text/x-fixed-field(9030), text/x-fixed-field(8958), text/x-fixed-field(8981), text/x-fixed-field(8970), text/x-fixed-field(8893), text/x-fixed-field(9080), text/x-fixed-field(8956), text/x-fixed-field(9124), text/x-fixed-field(8879), text/x-fixed-field(8997), text/x-fixed-field(8870), text/x-fixed-field(9112), text/x-fixed-field(8785), text/x-fixed-field(5339), text/x-fixed-field(9065), text/x-fixed-field(9098), text/x-fixed-field(8992), text/x-fixed-field(9038), text/x-fixed-field(9086), text/x-fixed-field(9004), text/x-fixed-field(8966), text/x-fixed-field(9097), text/x-fixed-field(9079), text/x-fixed-field(8827), text/x-fixed-field(9117), text/x-fixed-field(9053), text/x-fixed-field(9066), text/x-fixed-field(9155), text/x-fixed-field(8871), text/x-fixed-field(8989), text/x-fixed-field(8946), text/x-fixed-field(8877), text/x-fixed-field(9033), text/x-fixed-field(8354), text/x-fixed-field(8656), text/x-fixed-field(9067), text/x-fixed-field(9052), text/x-fixed-field(8921), text/x-fixed-field(8944), text/x-fixed-field(8938), text/x-fixed-field(9014), text/x-fixed-field(9051), text/x-fixed-field(6889), text/x-fixed-field(8362), text/x-fixed-field(9089), text/x-fixed-field(8896), text/x-fixed-field(8874), text/x-fixed-field(9060), text/x-fixed-field(8963), text/x-fixed-field(9032), text/x-fixed-field(9166), text/x-fixed-field(8876), text/x-fixed-field(9063), text/x-fixed-field(8926), text/x-fixed-field(8688), text/x-fixed-field(9025), text/x-fixed-field(8868), text/x-fixed-field(8612), text/x-fixed-field(8974), text/x-fixed-field(9247), text/x-fixed-field(8934), text/x-fixed-field(8883), text/x-fixed-field(8880), text/x-fixed-field(3158), text/x-fixed-field(9012), text/x-fixed-field(8955)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/DB56VY
Dataset updated
Jul 20, 2020
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data and code to generate Figures from Hakim et al. Evaluating the cognitive mechanisms of phishing detection with PEST, an ecologically valid lab-based measure of phishing susceptibility NOTE: Figure 4 requires data from the original PHIT task. These are available online at XXX Data files are csv files. Naming has the following form: scamdata_SUBJECTNUMBER_DATETIME_AGE_GENDER.dat e.g. scamdata_1_10Oct2018090103_18_F.dat Each datafile has 7 columns : userId : subject response (1 - safe with high confidence, 2 - safe with low confidence, 3 - scam with low confidence, 4 - scam with high confidence) reactTime : reaction time in seconds category : PHIT Email Category (and custom categories for pooled scam/safe emails) type : weapon of influence (for PHIT emails only) hasAtt : binary indicating whether email has an attachment realID : real email identifier (scam or safe) emailCode : unique ID of each email - used to locate specific emails within excel files
m
Web page phishing detection
data.mendeley.com
Updated Jun 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelhakim Hannousse (2021). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.3
Explore at:
Unique identifier
https://doi.org/10.17632/c2gw7fy2j4.3
Dataset updated
Jun 25, 2021
Authors
Abdelhakim Hannousse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension. Datasets are constructed on May 2020.

dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.

Data Set Characteristics : Multivariate	Number of Instances : 1353
Attribute Characteristics : Integer	Number of Attributes : 10
Associated Tasks : Classification	Number of Web Hits : 54880

Facebook

Twitter

Click to copy link

Link copied

Cite

(2019). Phishing-Dataset-for-Machine-Learning [Dataset]. https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning

Phishing-Dataset-for-Machine-Learning

Explore at:

74 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 5, 2019

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identify Phishing using Machine learning Algorithms

Clear search

Close search

Google apps

Main menu

Phishing-Dataset-for-Machine-Learning

Phishing Websites Dataset - Dataset - B2FIND

Website Phishing Dataset

Phishing Websites Dataset

PhishingWebsites

Ethereum Phishing Transaction Network

phishing-dataset

Phishing website dataset

Dataset

Contents

Phishing Statistics By Types, Country and Age Group

Introduction

Editorâ€™s Choice

LLM Generated Spear Phishing Emails Dataset

Failure rates for phishing simulations in companies worldwide 2021-2022, by...

Phishing website Detector

Description

Background to Problem Statement :

Dataset Description:

‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2

Context

Content

Acknowledgements

Phishing_Dataset

Dataset

Contents

Outcomes of successful phishing attacks in companies worldwide 2021-2023

Phishing and Benign Websites

Context

Content

Citation

Phishing Dataset for Machine Learning: Feature Evaluation

Phishing Website Data Set

Evaluating the cognitive mechanisms of phishing detection with PEST, an...

Web page phishing detection

Phishing-Dataset-for-Machine-Learning