78 datasets found

i
Pristine and Malicious URLs
ieee-dataport.org
Updated Nov 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ehsan Nowroozi (2023). Pristine and Malicious URLs [Dataset]. https://ieee-dataport.org/documents/pristine-and-malicious-urls
Explore at:
Dataset updated
Nov 6, 2023
Authors
Ehsan Nowroozi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The goal of our research is to identify malicious advertisement URLs and to apply adversarial attack on ensembles. We extract lexical and web-scrapped features from using python code. And then 4 machine learning algorithms are applied for the classification process and then used the K-Means clustering for the visual understanding. We check the vulnerability of the models by the adversarial examples. We applied Zeroth Order Optimization adversarial attack on the models and compute the attack accuracy.
f
Summary of previous works on malicious URL detection.
plos.figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait (2024). Summary of previous works on malicious URL detection. [Dataset]. http://doi.org/10.1371/journal.pone.0302196.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302196.t001
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of previous works on malicious URL detection.
f
The performance of the proposed ensemble classifier in classifying four...
plos.figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait (2024). The performance of the proposed ensemble classifier in classifying four classes of malicious URLs for testing data. [Dataset]. http://doi.org/10.1371/journal.pone.0302196.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302196.t007
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance of the proposed ensemble classifier in classifying four classes of malicious URLs for testing data.
h
phishing_url_classification
huggingface.co
Updated Jun 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darshan Patil (2025). phishing_url_classification [Dataset]. https://huggingface.co/datasets/darshan8950/phishing_url_classification
Explore at:
Dataset updated
Jun 8, 2025
Authors
Darshan Patil
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset for Detecting Phishing URLs

This dataset contains URLs labeled as 'Safe' (0) or 'Not Safe' (1) for phishing detection tasks.

Dataset Summary

This dataset contains URLs labeled for phishing detection tasks. It's designed to help train and evaluate models that can identify potentially malicious URLs.

Dataset Creation

The dataset was synthetically generated using a custom script that creates both legitimate and potentially phishing URLs. This approach… See the full description on the dataset page: https://huggingface.co/datasets/darshan8950/phishing_url_classification.

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

zenodo.org
explore.openaire.eu

json

Updated Dec 10, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Radek Hranický; Radek Hranický; Adam Horák; Ondřej Ondryáš; Ondřej Ondryáš; Adam Horák (2024). A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024 [Dataset]. http://doi.org/10.5281/zenodo.13330074

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13330074

Dataset updated

Dec 10, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Radek Hranický; Radek Hranický; Adam Horák; Ondřej Ondryáš; Ondřej Ondryáš; Adam Horák

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Aug 16, 2024

Description

The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.

The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.

Data Files

The data is located in the following individual files:
- benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
- benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
- phishing.json - data for 164,425 phishing domains, and
- malware.json - data for 100,809 malware domains.

Data Structure

Both files contain a JSON array of records generated using mongoexport. The following table documents the structure of a record. Please note that:

some fields may be missing (they should be interpreted as nulls),
extra fields may be present (they should be ignored).

Field name	Field type	Nullable	Description
domain_name	String	No	The evaluated domain name
url	String	No	The source URL for the domain name
evaluated_on	Date	No	Date of last collection attempt
source	String	No	An identifier of the source
sourced_on	Date	No	Date of ingestion of the domain name
dns	Object	Yes	Data from DNS scan
rdap	Object	Yes	Data from RDAP or WHOIS
tls	Object	Yes	Data from TLS handshake
ip_data	Array of Objects	Yes	Array of data objects capturing the IP addresses related to the domain name
DNS data (dns field)
A	Array of Strings	No	Array of IPv4 addresses
AAAA	Array of Strings	No	Array of IPv6 addresses
TXT	Array of Strings	No	Array of raw TXT values
CNAME	Object	No	The CNAME target and related IPs
MX	Array of Objects	No	Array of objects with the MX target hostname, priority and related IPs
NS	Array of Objects	No	Array of objects with the NS target hostname and related IPs
SOA	Object	No	All the SOA fields, present if found at the target domain name
zone_SOA	Object	No	The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly
dnssec	Object	No	Flags describing the DNSSEC validation result for each record type
ttls	Object	No	The TTL values for each record type
remarks	Object	No	The zone domain name and DNSSEC flags
RDAP data (rdap field)
copyright_notice	String	No	RDAP/WHOIS data usage copyright notice
dnssec	Bool	No	DNSSEC presence flag
entitites	Object	No	An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.
expiration_date	Date	Yes	The current date of expiration
handle	String	No	RDAP handle
last_changed_date	Date	Yes	The date when the domain was last changed
name	String	No	The target domain name for which the data in this object are stored
nameservers	Array of Strings	No	Nameserver hostnames provided by RDAP or WHOIS
registration_date	Date	Yes	First registration date
status	Array of Strings

Malicious URLs with preprocessing and split
kaggle.com
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihan ZHAO_qq (2023). Malicious URLs with preprocessing and split [Dataset]. https://www.kaggle.com/datasets/zihanzhaoqq/malicious-urls-with-preprocessing-and-split
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Zihan ZHAO_qq
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Zihan ZHAO_qq

Released under Apache 2.0

Contents
i
ISCX-URL-2016
ieee-dataport.org
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajidha S A (2023). ISCX-URL-2016 [Dataset]. https://ieee-dataport.org/documents/iscx-url-2016
Explore at:
Dataset updated
Dec 22, 2023
Authors
Sajidha S A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for mostly blacklisting of malicious URLs.
Common industries for malicious URL redirections South Korea H2 2021
statista.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common industries for malicious URL redirections South Korea H2 2021 [Dataset]. https://www.statista.com/statistics/1311491/south-korea-malicious-url-redirect-common-industries/
Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
South Korea
Description
In the second half of 2021, websites regarding manufacturing were the most common websites to be targeted by malicious URL redirections, with ** percent of detected cases being found on these sites. Although manufacturing websites have been a common target for malware attacks before, finds on these sites have largely increased compared to the first half of the year, which recorded around ** percent of cases redirecting through that industry.
malicious-url
kaggle.com
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kianindeed (2024). malicious-url [Dataset]. https://www.kaggle.com/datasets/kianindeed/malicious-url
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
kianindeed
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by kianindeed

Released under MIT

Contents
f
Hyperparameter of Catboost classifier.
figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait (2024). Hyperparameter of Catboost classifier. [Dataset]. http://doi.org/10.1371/journal.pone.0302196.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302196.t005
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Web applications are important for various online businesses and operations because of their platform stability and low operation cost. The increasing usage of Internet-of-Things (IoT) devices within a network has contributed to the rise of network intrusion issues due to malicious Uniform Resource Locators (URLs). Generally, malicious URLs are initiated to promote scams, attacks, and frauds which can lead to high-risk intrusion. Several methods have been developed to detect malicious URLs in previous works. There has been a good amount of work done to detect malicious URLs using various methods such as random forest, regression, LightGBM, and more as reported in the literature. However, most of the previous works focused on the binary classification of malicious URLs and are tested on limited URL datasets. Nevertheless, the detection of malicious URLs remains a challenging task that remains open to research. Hence, this work proposed a stacking-based ensemble classifier to perform multi-class classification of malicious URLs on larger URL datasets to justify the robustness of the proposed method. This study focuses on obtaining lexical features directly from the URL to identify malicious websites. Then, the proposed stacking-based ensemble classifier is developed by integrating Random Forest, XGBoost, LightGBM, and CatBoost. In addition, hyperparameter tuning was performed using the Randomized Search method to optimize the proposed classifier. The proposed stacking-based ensemble classifier aims to take advantage of the performance of each machine learning model and aggregate the output to improve prediction accuracy. The classification accuracies of the machine learning model when applied individually are 93.6%, 95.2%, 95.7% and 94.8% for random forest, XGBoost, LightGBM, and CatBoost respectively. The proposed stacking-based ensemble classifier has shown significant results in classifying four classes of malicious URLs (phishing, malware, defacement, and benign) with an average accuracy of 96.8% when benchmarked with previous works.
i
malicious and benign websites
ieee-dataport.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Urcuqui (2025). malicious and benign websites [Dataset]. https://ieee-dataport.org/documents/malicious-and-benign-websites
Explore at:
Dataset updated
Jun 17, 2025
Authors
Christian Urcuqui
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One important topic to work is to create a good set of malicious web characteristics
Malicious URL
kaggle.com
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreeshail Chavan (2024). Malicious URL [Dataset]. https://www.kaggle.com/datasets/shreeshailchavan/malicious-url/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shreeshail Chavan
Description
Dataset

This dataset was created by Shreeshail Chavan

Contents
f
Hyperparameter of tuned Random Forest classifier.
figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait (2024). Hyperparameter of tuned Random Forest classifier. [Dataset]. http://doi.org/10.1371/journal.pone.0302196.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302196.t003
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Web applications are important for various online businesses and operations because of their platform stability and low operation cost. The increasing usage of Internet-of-Things (IoT) devices within a network has contributed to the rise of network intrusion issues due to malicious Uniform Resource Locators (URLs). Generally, malicious URLs are initiated to promote scams, attacks, and frauds which can lead to high-risk intrusion. Several methods have been developed to detect malicious URLs in previous works. There has been a good amount of work done to detect malicious URLs using various methods such as random forest, regression, LightGBM, and more as reported in the literature. However, most of the previous works focused on the binary classification of malicious URLs and are tested on limited URL datasets. Nevertheless, the detection of malicious URLs remains a challenging task that remains open to research. Hence, this work proposed a stacking-based ensemble classifier to perform multi-class classification of malicious URLs on larger URL datasets to justify the robustness of the proposed method. This study focuses on obtaining lexical features directly from the URL to identify malicious websites. Then, the proposed stacking-based ensemble classifier is developed by integrating Random Forest, XGBoost, LightGBM, and CatBoost. In addition, hyperparameter tuning was performed using the Randomized Search method to optimize the proposed classifier. The proposed stacking-based ensemble classifier aims to take advantage of the performance of each machine learning model and aggregate the output to improve prediction accuracy. The classification accuracies of the machine learning model when applied individually are 93.6%, 95.2%, 95.7% and 94.8% for random forest, XGBoost, LightGBM, and CatBoost respectively. The proposed stacking-based ensemble classifier has shown significant results in classifying four classes of malicious URLs (phishing, malware, defacement, and benign) with an average accuracy of 96.8% when benchmarked with previous works.
Malicious_URLs
kaggle.com
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SethMDoty (2022). Malicious_URLs [Dataset]. https://www.kaggle.com/datasets/sethmdoty/malicious-urls
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SethMDoty
Description
Dataset

This dataset was created by SethMDoty

Contents
Global share of employees clicking on malicious links 2023
statista.com
ai-chatbox.pro
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global share of employees clicking on malicious links 2023 [Dataset]. https://www.statista.com/statistics/1491680/employees-worldwide-clicking-malicious-links/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 9, 2023 - Oct 27, 2023
Area covered
Worldwide
Description
An October 2023 phishing simulation carried out at worldwide organizations found that *** percent of employees submitted passwords in the form embedded in the malicious webpage. On the other hand, *** percent of them clicked only the link, and **** percent did not click the link.
Number of malicious websites detected Thailand 2023, by type
statista.com
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of malicious websites detected Thailand 2023, by type [Dataset]. https://www.statista.com/statistics/1405677/thailand-number-of-malicious-websites-by-type/
Explore at:
Dataset updated
Oct 14, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2023
Area covered
Thailand
Description
As of September 2023, over 500 gambling websites were detected and settled by the Thailand Computer Emergency Response Team (ThaiCERT) in Thailand. In contrast, only 12 malware websites were detected in that year. ThaiCERT is an internal department of the National Cyber Security Agency (NCSA) tasked with monitoring cyber threats in the country.
f
The 22 lexical features were extracted using the feature engineering...
plos.figshare.com
xls
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait (2024). The 22 lexical features were extracted using the feature engineering approach. [Dataset]. http://doi.org/10.1371/journal.pone.0302196.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302196.t002
Dataset updated
May 31, 2024
Dataset provided by
PLOS ONE
Authors
Suresh Sankaranarayanan; Arvinthan Thevar Sivachandran; Anis Salwa Mohd Khairuddin; Khairunnisa Hasikin; Abdul Rahman Wahab Sait
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The 22 lexical features were extracted using the feature engineering approach.
h
PhishingURLsDataset
huggingface.co
Updated Jan 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semih Güner (2024). PhishingURLsDataset [Dataset]. https://huggingface.co/datasets/semihGuner2002/PhishingURLsDataset
Explore at:
Dataset updated
Jan 12, 2024
Authors
Semih Güner
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
PhishingURLDataset

This dataset is created for being used for neural network training, on phishing website detection. It has been generated using this raw template.

Dataset Details

This dataset contains phishing websites, which are labeled with "1" and are called "malignant", and benign websites, which are labeled with "0".

Dataset Sources

Kaggle Dataset on Phishing URLs: https://www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls USOM Phishing… See the full description on the dataset page: https://huggingface.co/datasets/semihGuner2002/PhishingURLsDataset.
Multi-version Benign & Malicious QR codes dataset
kaggle.com
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samah Malibari (2022). Multi-version Benign & Malicious QR codes dataset [Dataset]. https://www.kaggle.com/datasets/samahsadiq/multiversion-benign-malicious-qr-codes-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samah Malibari
Description
Dataset Creation The Multi-version dataset contains in total 700 QR codes related to 7 different versions. Each version consists of 100 QR codes (50 Benign and 50 Malicious).

The creation was done by using 100 unique URLs that a subset of a Balanced Dataset can be found here: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls

For each version, two loops in Python were written to create 50 benign and 50 malicious QR codes. Each QR code's path indicates the QR code class type, version, and the id of the encoded URL within the mentioned balanced dataset of URLs.

NOTE: Keep in mind that malicious QR codes are encoded a REAL malicious URLs, it is not recommended to scan them manually and visiting their encoded websites. For more informations about the encoded URLs, please refer to the mentioned dataset above in Kaggle.
S
Suspicious File and URL Analysis Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Suspicious File and URL Analysis Report [Dataset]. https://www.archivemarketresearch.com/reports/suspicious-file-and-url-analysis-55344
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 10, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for suspicious file and URL analysis is experiencing robust growth, projected to reach $88 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 6.4% from 2025 to 2033. This expansion is driven by the escalating sophistication of cyber threats, the increasing reliance on digital infrastructure across various sectors, and the growing need for proactive security measures to mitigate risks associated with malicious files and URLs. The market's segmentation reveals a strong preference for cloud-based solutions, offering scalability and accessibility to organizations of all sizes. Large enterprises are the primary consumers, reflecting their higher vulnerability to advanced cyberattacks and their greater capacity for investment in robust security solutions. However, the market is also seeing significant adoption among SMEs, driven by the increasing affordability and ease of use of cloud-based solutions and a rising awareness of the risks associated with malicious online content. Several factors contribute to market growth. The development and proliferation of advanced malware necessitates continuous improvement in threat detection and analysis capabilities. Furthermore, the expanding attack surface due to remote work and the increasing use of IoT devices are contributing to a heightened demand for effective file and URL analysis tools. Regulatory compliance requirements, particularly within sensitive industries like finance and healthcare, further incentivize organizations to invest in these solutions. Conversely, challenges such as the emergence of obfuscated malware, the high cost of advanced solutions, and the need for specialized expertise pose some restraints to broader market penetration. The competitive landscape is diverse, with established cybersecurity players and innovative startups offering a range of solutions catering to specific needs and budgets. This competitive pressure is ultimately beneficial for consumers, driving innovation and fostering a more efficient and effective market for suspicious file and URL analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ehsan Nowroozi (2023). Pristine and Malicious URLs [Dataset]. https://ieee-dataport.org/documents/pristine-and-malicious-urls

Pristine and Malicious URLs

Explore at:

Dataset updated

Nov 6, 2023

Authors

Ehsan Nowroozi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The goal of our research is to identify malicious advertisement URLs and to apply adversarial attack on ensembles. We extract lexical and web-scrapped features from using python code. And then 4 machine learning algorithms are applied for the classification process and then used the K-Means clustering for the visual understanding. We check the vulnerability of the models by the adversarial examples. We applied Zeroth Order Optimization adversarial attack on the models and compute the attack accuracy.

Clear search

Close search

Google apps

Main menu

Pristine and Malicious URLs

Summary of previous works on malicious URL detection.

The performance of the proposed ensemble classifier in classifying four...

phishing_url_classification

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

Data Files

Data Structure

Malicious URLs with preprocessing and split

Dataset

Contents

ISCX-URL-2016

Common industries for malicious URL redirections South Korea H2 2021

malicious-url

Dataset

Contents

Hyperparameter of Catboost classifier.

malicious and benign websites

Malicious URL

Dataset

Contents

Hyperparameter of tuned Random Forest classifier.

Malicious_URLs

Dataset

Contents

Global share of employees clicking on malicious links 2023

Number of malicious websites detected Thailand 2023, by type

The 22 lexical features were extracted using the feature engineering...

PhishingURLsDataset

Multi-version Benign & Malicious QR codes dataset

Suspicious File and URL Analysis Report

Pristine and Malicious URLs