Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of previous works on malicious URL detection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.
The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.
The data is located in the following individual files:
Both files contain a JSON array of records generated using mongoexport. The following table documents the structure of a record. Please note that:
Field name |
Field type |
Nullable |
Description |
domain_name |
String |
No |
The evaluated domain name |
url |
String |
No |
The source URL for the domain name |
evaluated_on |
Date |
No |
Date of last collection attempt |
source |
String |
No |
An identifier of the source |
sourced_on |
Date |
No |
Date of ingestion of the domain name |
dns |
Object |
Yes |
Data from DNS scan |
rdap |
Object |
Yes |
Data from RDAP or WHOIS |
tls |
Object |
Yes |
Data from TLS handshake |
ip_data |
Array of Objects |
Yes |
Array of data objects capturing the IP addresses related to the domain name |
DNS data (dns field) | |||
A |
Array of Strings |
No |
Array of IPv4 addresses |
AAAA |
Array of Strings |
No |
Array of IPv6 addresses |
TXT |
Array of Strings |
No |
Array of raw TXT values |
CNAME |
Object |
No |
The CNAME target and related IPs |
MX |
Array of Objects |
No |
Array of objects with the MX target hostname, priority and related IPs |
NS |
Array of Objects |
No |
Array of objects with the NS target hostname and related IPs |
SOA |
Object |
No |
All the SOA fields, present if found at the target domain name |
zone_SOA |
Object |
No |
The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly |
dnssec |
Object |
No |
Flags describing the DNSSEC validation result for each record type |
ttls |
Object |
No |
The TTL values for each record type |
remarks |
Object |
No |
The zone domain name and DNSSEC flags |
RDAP data (rdap field) | |||
copyright_notice |
String |
No |
RDAP/WHOIS data usage copyright notice |
dnssec |
Bool |
No |
DNSSEC presence flag |
entitites |
Object |
No |
An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities. |
expiration_date |
Date |
Yes |
The current date of expiration |
handle |
String |
No |
RDAP handle |
last_changed_date |
Date |
Yes |
The date when the domain was last changed |
name |
String |
No |
The target domain name for which the data in this object are stored |
nameservers |
Array of Strings |
No |
Nameserver hostnames provided by RDAP or WHOIS |
registration_date |
Date |
Yes |
First registration date |
status |
Array of Strings |
An October 2023 phishing simulation carried out at worldwide organizations found that the highest share of employees clicking on malicious links were working at small organizations, with one to 99 employees. Furthermore, those working at organizations with 100 to 499 employees were more likely to submit their passwords to malicious websites.
In the second half of 2021, websites regarding manufacturing were the most common websites to be targeted by malicious URL redirections, with 39 percent of detected cases being found on these sites. Although manufacturing websites have been a common target for malware attacks before, finds on these sites have largely increased compared to the first half of the year, which recorded around 23 percent of cases redirecting through that industry.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by antonyj
Released under Database: Open Database, Contents: Database Contents
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Web applications are important for various online businesses and operations because of their platform stability and low operation cost. The increasing usage of Internet-of-Things (IoT) devices within a network has contributed to the rise of network intrusion issues due to malicious Uniform Resource Locators (URLs). Generally, malicious URLs are initiated to promote scams, attacks, and frauds which can lead to high-risk intrusion. Several methods have been developed to detect malicious URLs in previous works. There has been a good amount of work done to detect malicious URLs using various methods such as random forest, regression, LightGBM, and more as reported in the literature. However, most of the previous works focused on the binary classification of malicious URLs and are tested on limited URL datasets. Nevertheless, the detection of malicious URLs remains a challenging task that remains open to research. Hence, this work proposed a stacking-based ensemble classifier to perform multi-class classification of malicious URLs on larger URL datasets to justify the robustness of the proposed method. This study focuses on obtaining lexical features directly from the URL to identify malicious websites. Then, the proposed stacking-based ensemble classifier is developed by integrating Random Forest, XGBoost, LightGBM, and CatBoost. In addition, hyperparameter tuning was performed using the Randomized Search method to optimize the proposed classifier. The proposed stacking-based ensemble classifier aims to take advantage of the performance of each machine learning model and aggregate the output to improve prediction accuracy. The classification accuracies of the machine learning model when applied individually are 93.6%, 95.2%, 95.7% and 94.8% for random forest, XGBoost, LightGBM, and CatBoost respectively. The proposed stacking-based ensemble classifier has shown significant results in classifying four classes of malicious URLs (phishing, malware, defacement, and benign) with an average accuracy of 96.8% when benchmarked with previous works.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global market for suspicious file and URL analysis is experiencing robust growth, driven by the escalating sophistication and frequency of cyberattacks. The market size in 2025 is estimated at $136.7 million. Considering the rapid expansion of digital infrastructure and the increasing reliance on cloud services, coupled with the persistent threat landscape, we project a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This growth is fueled by several key factors. The rising adoption of cloud-based solutions offers scalability and cost-effectiveness for businesses of all sizes, accelerating market penetration. Moreover, the increasing integration of AI and machine learning into analysis tools enhances detection accuracy and reduces response times. The demand for proactive threat detection and prevention is a significant driver, leading organizations to invest in comprehensive security solutions that include robust suspicious file and URL analysis capabilities. Emerging trends such as the proliferation of IoT devices and the growth of the dark web further amplify the need for advanced threat intelligence and effective analysis techniques. However, market growth is not without its challenges. The complexity of malware and evolving attack techniques pose ongoing difficulties for security professionals. Furthermore, the lack of skilled cybersecurity personnel and the high cost of advanced solutions can present barriers to entry for some organizations, particularly smaller businesses. The market is segmented by deployment type (cloud-based and on-premise) and user type (large enterprises and SMEs). The cloud-based segment is expected to dominate due to its flexibility and scalability. Large enterprises are currently the leading consumers of these solutions, reflecting their greater resources and heightened security needs. However, growing awareness and the decreasing cost of entry are expected to drive significant growth within the SME segment during the forecast period. Geographic distribution sees North America currently holding a major market share, followed by Europe and Asia Pacific, with rapid growth anticipated in the Asia-Pacific region due to increasing digitalization and a rising number of cyberattacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Learning rate—Performance comparison with Phishtank dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Zihan ZHAO_QQ_little
Released under Apache 2.0
In 2023, the total detection cases of web-based malware sites in South Korea amounted to roughly 12.7 thousand, a slight decrease compared to the previous year. The highest number of detected web-based malware sites in South Korea was 47,703 cases in 2014. The type of web-based malware sites was comprised of distribution sites and staging sties.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial settings of parameters (training and testing phases).
Secure Web Gateway Market Size 2024-2028
The secure web gateway market size is forecast to increase by USD 19.45 billion at a CAGR of 26.79% between 2023 and 2028. The market is experiencing significant growth due to the increasing number of online security threats targeting web products and websites. Traditional firewalls once considered a cyber barrier, are no longer sufficient to protect against unauthorized traffic, malicious websites, and data leakage.
To address these challenges, secure web gateways employ a 7-layered traffic inspection approach, providing application-level control and data leakage prevention. As more companies adopt cloud-based solutions and allow remote workers, the need for advanced web security solutions becomes increasingly important. The rising adoption of cloud-based security technologies is a major market growth factor, despite the high implementation costs.
What will be the size of the Secure Web Gateway Market During the Forecast Period?
Request Free Sample
The cyber threat environment continues to evolve, with system viruses and unknown spyware posing significant risks to both individual and organizational data. Unsecured communication channels and malicious web traffic are common avenues for cyber-attacks, making it crucial for businesses to implement strong security solutions. A secure web gateway acts as a cyber barrier, safeguarding against online security threats and protecting against unauthorized traffic, malware attacks, and data breaches. Malicious web traffic, including viruses, malware, and harmful websites, can infiltrate an internal network through unsecured endpoints.
Moreover, trojan horses, adware, and spyware are common threats that can compromise individual data and organizational data. Remote employees working from home present an additional challenge, as they may access the company network through unsecured Wi-Fi or unprotected devices. A secure web gateway acts as a centralized filtering system, controlling web requests and blocking access to malicious websites. It provides an essential layer of security, preventing unauthorized access to sensitive data and protecting against known and unknown threats. By implementing a secure web gateway, companies can enforce company policy, ensuring that all web traffic is secure and compliant. Online security threats are a constant concern for businesses, with data breaches and malware attacks becoming increasingly common.
Furthermore, a secure web gateway helps mitigate these risks by providing a comprehensive solution for securing end-user data and web security. It blocks malicious web traffic, preventing the spread of viruses, malware, and other harmful software. By implementing a secure web gateway, businesses can protect their valuable data and maintain their online reputation. In conclusion, a secure web gateway is an essential component of any organization's cybersecurity strategy. It provides a critical layer of protection against online security threats, including malicious web traffic, viruses, malware, and harmful websites. By implementing a secure web gateway, businesses can safeguard their data, protect against unauthorized traffic, and maintain compliance with industry regulations.
Market Segmentation
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Deployment
Cloud
On-premises
End-user
BFSI
IT and telecom
Government and defense
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Japan
Middle East and Africa
South America
By Deployment Insights
The cloud segment is estimated to witness significant growth during the forecast period. The market is expected to experience substantial expansion in the coming years due to the escalating requirement for comprehensive data and identity security in corporations. With the rise in cybercrime and the increasing number of threats from hackers, the demand for sophisticated security solutions is surging. The adoption of cloud security services is gaining traction among large enterprises and small businesses as they transfer an increasing volume of sensitive and confidential data. The proliferation of mobile devices for both professional and personal use is further fueling the demand for cloud security solutions, as these devices are highly susceptible to attacks.
Furthermore, secure Web Gateways provide advanced functionalities such as antivirus, application control, data loss prevention, and HTTPS inspection to protect enterprises from harmful code and potential leaks. The financial sector, including banks, is a significant end user of Secure Web Gateways due to the sensitive nature of the data they handle. The
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
1. MalwareInfectionSet
We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
2. VictimAccessSet
We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
3. AccountAccessSet
The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of F1—Measure.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Market Size and Drivers: The global URL Filtering Tools market size was valued at USD XX million in 2025 and is projected to reach USD XX million by 2033, exhibiting a CAGR of XX% during the forecast period. The market growth is primarily driven by the increasing demand for content filtering solutions due to concerns over online safety, cyber threats, and inappropriate content exposure for children and employees. Organizations and individuals are adopting URL filtering tools to protect their networks from malicious websites, control access to age-inappropriate content, and enhance productivity by blocking distractions. Other major drivers include the rising adoption of cloud-based solutions, the growing prevalence of mobile devices, and the need to comply with data privacy regulations. Market Segments and Key Players: Based on type, the market is segmented into on-premises and cloud-based solutions. By application, it is categorized into large enterprises, SMEs, and individuals. Regionally, North America held the largest market share, followed by Europe and Asia Pacific. Key players in the URL Filtering Tools market include Qustodio, SafeToNet, Net at Work, Intego, Kaspersky, NortonLifeLock, Mobicip, Meet Circle, Blue Coat Systems, KidLogger, Cisco, Webroot, Salfeld, and others. These companies offer a range of solutions tailored to different user needs, including advanced filtering capabilities, real-time threat protection, and parental control features. The market is highly fragmented, with several niche players catering to specific segments.
In 2023, the worldwide number of malware attacks reached 6.06 billion, an increase of 10 percent compared to the preceding year. In recent years, the highest number of malware attacks was detected in 2018, when 10.5 billion such attacks were reported across the globe. Malware attacks worldwide In 2022, worm malware was blocked over 205 million times. Another common malware type during that period, Emotet, primarily targeted the Asia-Pacific region. Overall, websites are the most common vector for malware attacks and recent industry data found that malware attacks were frequently received via exe files. Most targeted industries In 2022, the education sector was heavily targeted by malware, encountering 2,314 weekly attacks on average. Government and military organizations ranked second, followed by the healthcare units. Overall, in 2022, the education sector saw over five million malware attacks in the examined year.
In 2023, the most common malware file type received worldwide via the web were Microsoft Windows .exe files followed by .sh files. On the other hand, .html files were the most commonly received malware file type via e-mail.
Network Security Software Market Size 2024-2028
The network security software market size is forecast to increase by USD 27.3 billion at a CAGR of 14.66% between 2023 and 2028. Network security software is a critical component for safeguarding digital assets in the age of increasing Internet penetration and advanced cyber security. The market is witnessing significant growth due to the rising frequency and sophistication of cyberattacks, including unauthorized access, insider threats, and phishing attacks. To counter these threats, organizations are adopting advanced network security solutions that prioritize secure network connections. A key trend in the market is the introduction of zero-trust security architecture, which assumes that all network traffic is potentially harmful and requires verification before granting access. Balancing security and user experience is another crucial factor driving market growth.
Request Free Sample
The market is a critical component in the modern business landscape, as organizations increasingly rely on digital infrastructure to drive growth and innovation. This market encompasses a range of solutions designed to safeguard network connections and data from various cyber threats. Firewalls represent a fundamental aspect of network security, acting as a barrier between an organization's internal network and the Internet. Firewalls monitor and control incoming and outgoing network traffic based on predetermined security rules, ensuring secure network connections. Antivirus and antimalware software are essential tools in the fight against malicious software.
IDS solutions analyze network traffic to detect intrusions, while IPS solutions go a step further by preventing attacks in real-time. Secure Web Gateways provide an additional layer of security by controlling access to the web and protecting against web-based threats. These solutions use various techniques, including URL filtering, malware scanning, and content analysis, to ensure secure browsing. Internet penetration testing is an essential practice for assessing the security of an organization's network and identifying vulnerabilities. Advanced software tools can simulate cyber attacks to help organizations fortify their defenses against real-world threats. Network security software is essential for large enterprises in various industries, including aerospace and defense, banking, and financial services.
Market Segmentation
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Component
Solution
Service
Deployment
Cloud
On-premises
Geography
North America
US
APAC
China
Japan
Europe
UK
France
Middle East and Africa
South America
By Component Insights
The solution segment is estimated to witness significant growth during the forecast period. The Network Security Market in the United States is witnessing significant growth due to the increasing number of cyber threats targeting businesses. This market encompasses various solutions such as firewalls, antivirus software, network access control, data loss prevention tools, and intrusion detection and prevention systems. Among these, firewalls are leading the market, providing advanced capabilities for monitoring and controlling data flow both internally and externally. Their importance is underscored by the growing emphasis on perimeter security. Antivirus and antimalware solutions are indispensable for safeguarding networks against a diverse range of malware. The sophistication of cyber threats necessitates continuous updates and improvements in these solutions.
Moreover, the shift towards remote working and e-learning, fueled by the COVID-19 pandemic, has led to an increased reliance on cloud-based solutions. This trend is further moved by the emergence of 5G technology, which promises faster and more reliable connectivity. Machine Learning (ML) and Artificial Intelligence (AI) are being integrated into network security solutions to enhance threat detection and response capabilities. In summary, the Network Security Market in the US is experiencing substantial growth due to the increasing cyber threats, the need for advanced perimeter security, the shift towards remote work and e-learning, and the integration of ML and AI into network security solutions.
Get a glance at the market share of various segments Request Free Sample
The solution segment accounted for USD 11.20 billion in 2018 and showed a gradual increase during the forecast period.
Regional Insights
North America is estimated to contribute 44% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global web filtering service market is projected to reach a value of USD 14.55 billion by 2033, exhibiting a CAGR of 7.4% during the forecast period. The market is primarily driven by the increasing adoption of web filtering solutions by organizations across various industries to protect their networks and data from malicious websites, phishing attacks, and other online threats. Additionally, the growing number of connected devices and the increasing use of mobile devices for accessing the internet are further fueling the demand for web filtering services. The market is segmented based on type into domain name system (DNS) filtering, uniform resource locator (URL) filtering, keyword filtering, file type filtering, and others. Among these, URL filtering holds the largest market share due to its effectiveness in blocking access to malicious websites and preventing phishing attacks. In terms of application, the government, BFSI, manufacturing, and healthcare sectors are the key contributors to market growth. Geographically, North America dominates the market due to the early adoption of web filtering solutions and the presence of major technology companies. However, the Asia Pacific region is expected to witness significant growth over the forecast period due to the rapid digitization and increasing internet penetration in the region. A Comprehensive Guide to Industry Dynamics, Market Trends, and Key Players
https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy
Global web filtering market is expected to grow above a CAGR of 8% and is anticipated to reach over USD 6,000 million by 2026. A Web filter is generally referred to as "content control software". It restricts the user from accessing website that carries malicious code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of previous works on malicious URL detection.