55 datasets found

Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Ferrell, Nathan
Chan-Tin, Eric
Moran, Madeline
Homan, Sophia
Soni, Shreena
Honig, Joshua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Share of global mobile website traffic 2015-2024
statista.com
ai-chatbox.pro
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of global mobile website traffic 2015-2024 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
Explore at:
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
a
Traffic Site
hub.arcgis.com
data-waikatolass.opendata.arcgis.com
+1more
Updated Sep 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamilton City Council (2021). Traffic Site [Dataset]. https://hub.arcgis.com/maps/hcc::traffic-site
Explore at:
Dataset updated
Sep 9, 2021
Dataset authored and provided by
Hamilton City Council
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Attributes of sites in Hamilton City which collect anonymised data from a sample of vehicles. Note: A Link is the section of the road between two sites

Column_InfoSite_Id, int : Unique identiferNumber, int : Asset number. Note: If the site is at a signalised intersection, Number will match 'Site_Number' in the table 'Traffic Signal Site Location'Is_Enabled, varchar : Site is currently enabledDisabled_Date, datetime : If currently disabled, the date at which the site was disabledSite_Name, varchar : Description of the site locationLatitude, numeric : North-south geographic coordinatesLongitude, numeric : East-west geographic coordinates

Relationship Disclaimer Hamilton City Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data released for public download. Levels, locations and dimensions of works depicted in the data may not be accurate due to circumstances not notified to Council. A physical check should be made on all levels, locations and dimensions before starting design or works. Hamilton City Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data. While you are free to crop, export and re-purpose the data, we ask that you attribute the Hamilton City Council and clearly state that your work is a derivative and not the authoritative data source. Please include the following statement when distributing any work derived from this data: ‘This work is derived entirely or in part from Hamilton City Council data; the provided information may be updated at any time, and may at times be out of date, inaccurate, and/or incomplete.'
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Uzbekistan, Saint Vincent and the Grenadines, Monaco, Latvia, Jordan, Belarus, Jamaica, Liechtenstein, Russian Federation, India
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat...
verifiedmarketresearch.com
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat Map Analytics), By Application (Social Media Management, Display Advertising Optimization), By Vertical (Baking, Financial Services And Insurance (BFSI), Retail), And Region for 2026-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/web-analytics-market/
Explore at:
Dataset updated
Nov 15, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Web Analytics Market was valued at USD 6.16 Billion in 2024 and is projected to reach USD 13.6 Billion by 2032, growing at a CAGR of 18.58% from 2026 to 2032.

Web Analytics Market Drivers

Data-Driven Decision Making: Businesses increasingly rely on data-driven insights to optimize their online strategies. Web analytics provides valuable data on website traffic, user behavior, and conversion rates, enabling data-driven decision-making.

E-commerce Growth: The rapid growth of e-commerce has fueled the demand for web analytics tools to track online sales, customer behavior, and marketing campaign effectiveness.

Mobile Dominance: The increasing use of mobile devices for internet browsing has made mobile analytics a crucial aspect of web analytics. Businesses need to understand how users interact with their websites and apps on mobile devices.

analytics tools can be complex to implement and use, requiring technical expertise.
World Traffic Web Map
walmart-event-collaboration-portal-walmarttech.hub.arcgis.com
Updated Jun 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walmart Emergency Management (2021). World Traffic Web Map [Dataset]. https://walmart-event-collaboration-portal-walmarttech.hub.arcgis.com/maps/world-traffic-web-map
Explore at:
Dataset updated
Jun 18, 2021
Dataset provided by
Walmarthttp://walmart.com/
Authors
Walmart Emergency Management
Area covered

Description
This is a dynamic traffic map service with capabilities for visualizing traffic speeds relative to free-flow speeds as well as traffic incidents which can be visualized and identified. The traffic data is updated every five minutes. Traffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%Esri's historical, live, and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. Historical traffic is based on the average of observed speeds over the past three years. The live and predictive traffic data is updated every five minutes through traffic feeds. The color coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation and field operations. The traffic map layer contains two sublayers: Traffic and Live Traffic. The Traffic sublayer (shown by default) leverages historical, live and predictive traffic data; while the Live Traffic sublayer is calculated from just the live and predictive traffic data only. A color coded traffic map image can be requested for the current time and any time in the future. A map image for a future request might be used for planning purposes. The map layer also includes dynamic traffic incidents showing the location of accidents, construction, closures and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis. The service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. In the coverage map, the countries color coded in dark green support visualizing live traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, including a data coverage map, visit the directions and routing documentation and ArcGIS Help.
Total global visitor traffic to Google.com 2024
statista.com
ai-chatbox.pro
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
Leading websites worldwide 2024, by monthly visits
statista.com
ai-chatbox.pro
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
Worldwide
Description
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
USA Traffic Counts for Site Selection
hub.arcgis.com
Updated Jun 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2016). USA Traffic Counts for Site Selection [Dataset]. https://hub.arcgis.com/datasets/07bf63e8238b44e7ba44cdcadcc5a8c2
Explore at:
Dataset updated
Jun 21, 2016
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
United States
Description
To check traffic counts around a potential business location simply enter an address on the top bar. The application will draw a one mile circle around the location and provide a list of traffic count points. You may also click anywhere on the map to drop a point. Then click the point or the graphic on the right to reveal a pop up with:The most recent traffic countThe count type (see the methodology document for definitions)A graph showing up to the last five available traffic counts at that locationThe large circled number in the side panel displays the number of points within the one mile radius. Under the circle there is a slide bar that can enlarge the selection area up to 10 miles. Under the slide bar, each point is displayed and clicking here will also reveal the pop up.Additional Esri Resources:U.S. Traffic Count and Methodology2016 Traffic Counts in the United States web mapBusiness Data Summary and MethodologyUpdated Demographics and MethodologyEsri's arcgis.com demographic map layers
o
Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Dec 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Luxemburk; Karel Hynek; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška (2022). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. http://doi.org/10.5281/zenodo.7409923
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7409923
Dataset updated
Dec 7, 2022
Authors
Jan Luxemburk; Karel Hynek; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška
Description
Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size: W-2022-44 Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45 Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46 Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47 Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22 Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list belo...
d
Jefferson County KY Traffic Web Cameras
catalog.data.gov
data.lojic.org
+5more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Jefferson County KY Traffic Web Cameras [Dataset]. https://catalog.data.gov/dataset/jefferson-county-ky-traffic-web-cameras-2b335
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Jefferson County, Kentucky
Description
TRIMARC (Traffic Response and Incident Management Assisting the River City) camera locations in Louisville Metro Kentucky. This feature layer was created from a TRIMARC JSON files of camera locations. This item includes description, direction, and videos links and is used in the Louisville Metro Snow Map. The cameras are used to monitor the roadways and verify incidents to assist in freeway and incident management This feature is a static extract and will be reviewed before each snow season for updates. For more information on this feature layer and it's use please contact Louisville Metro GIS or LOJIC. To learn more about TRIMARC please visit the following website http://www.trimarc.org.
🕵️ Phishing Websites Data
kaggle.com
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sairaj Adhav (2025). 🕵️ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sairaj Adhav
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Phishing Websites Dataset

Overview

This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

Dataset Information

Total Columns: 31 (30 Features + 1 Target)

Target Variable: Result (Indicates whether a website is phishing or legitimate)

Features Description

URL-Based Features

Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.

double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.

having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.

Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).

URL_Length – Measures the length of the URL; phishing URLs tend to be longer.

having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

Domain-Based Features

having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.

SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).

Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.

age_of_domain – The age of the domain in days; older domains are usually more trustworthy.

DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

Webpage-Based Features

Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).

port – Identifies if the site is using suspicious or non-standard ports.

HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.

Request_URL – Measures the percentage of external resources loaded from different domains.

URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.

Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.

SFH (Server Form Handler) – Determines if form actions are handled suspiciously.

Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.

Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.

Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

Behavior-Based Features

on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).

RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).

popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.

Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

Traffic & Search Engine Features

web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.

Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.

Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).

Links_pointing_to_page – Counts the number of backlinks pointing to the website.

Statistical_report – Uses external sources to verify if the website has been reported for phishing.

Target Variable

Result – The classification label (1: Legitimate, -1: Phishing)

Usage

This dataset is valuable for:
✅ Machine Learning Models – Developing classifiers for phishing detection.
✅ Cybersecurity Research – Understanding patterns in phishing attacks.
✅ Browser Security Extensions – Enhancing anti-phishing tools.
R
Indian Traffic Sign Dataset
universe.roboflow.com
zip
Updated Sep 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCluster Labs (2023). Indian Traffic Sign Dataset [Dataset]. https://universe.roboflow.com/datacluster-labs-agryi/indian-traffic-sign-vvx9y
Explore at:
zipAvailable download formats
Dataset updated
Sep 11, 2023
Dataset authored and provided by
DataCluster Labs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Traffic Signals Bounding Boxes
Description
Indian Traffic Sign Image Dataset

Datasets for Indian traffic signs

About Dataset

**This dataset is collected by Datacluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: s*ales@datacluster.ai* **

This dataset is an extremely challenging set of over 2000+ original Indian Traffic Sign images captured and crowdsourced from over 400+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at DC Labs.

Dataset Features 1. Dataset size : 2000+ 2. Captured by : Over 400+ crowdsource contributors 3. Resolution : 100% of images HD and above (1920x1080 and above) 4. Location : Captured with 400+ cities accross India 5. Diversity : Various lighting conditions like day, night, varied distances, view points etc. 6. Device used : Captured using mobile phones in 2020-2021 7. Usage : Traffic sign detection, Self-driving systems, traffic detection, sign detection, etc.

Available Annotation formats COCO, YOLO, PASCAL-VOC, Tf-Record

The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.
Secure Web Gateway Market Analysis North America, Europe, APAC, Middle East...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Secure Web Gateway Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/secure-web-gateway-market-industry-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Germany, United Kingdom, United States, Global
Description
Snapshot img

Secure Web Gateway Market Size 2024-2028

The secure web gateway market size is forecast to increase by USD 19.45 billion at a CAGR of 26.79% between 2023 and 2028. The market is experiencing significant growth due to the increasing number of online security threats targeting web products and websites. Traditional firewalls once considered a cyber barrier, are no longer sufficient to protect against unauthorized traffic, malicious websites, and data leakage.

To address these challenges, secure web gateways employ a 7-layered traffic inspection approach, providing application-level control and data leakage prevention. As more companies adopt cloud-based solutions and allow remote workers, the need for advanced web security solutions becomes increasingly important. The rising adoption of cloud-based security technologies is a major market growth factor, despite the high implementation costs.

What will be the size of the Secure Web Gateway Market During the Forecast Period?

Request Free Sample

The cyber threat environment continues to evolve, with system viruses and unknown spyware posing significant risks to both individual and organizational data. Unsecured communication channels and malicious web traffic are common avenues for cyber-attacks, making it crucial for businesses to implement strong security solutions. A secure web gateway acts as a cyber barrier, safeguarding against online security threats and protecting against unauthorized traffic, malware attacks, and data breaches. Malicious web traffic, including viruses, malware, and harmful websites, can infiltrate an internal network through unsecured endpoints.

Moreover, trojan horses, adware, and spyware are common threats that can compromise individual data and organizational data. Remote employees working from home present an additional challenge, as they may access the company network through unsecured Wi-Fi or unprotected devices. A secure web gateway acts as a centralized filtering system, controlling web requests and blocking access to malicious websites. It provides an essential layer of security, preventing unauthorized access to sensitive data and protecting against known and unknown threats. By implementing a secure web gateway, companies can enforce company policy, ensuring that all web traffic is secure and compliant. Online security threats are a constant concern for businesses, with data breaches and malware attacks becoming increasingly common.

Furthermore, a secure web gateway helps mitigate these risks by providing a comprehensive solution for securing end-user data and web security. It blocks malicious web traffic, preventing the spread of viruses, malware, and other harmful software. By implementing a secure web gateway, businesses can protect their valuable data and maintain their online reputation. In conclusion, a secure web gateway is an essential component of any organization's cybersecurity strategy. It provides a critical layer of protection against online security threats, including malicious web traffic, viruses, malware, and harmful websites. By implementing a secure web gateway, businesses can safeguard their data, protect against unauthorized traffic, and maintain compliance with industry regulations.

Market Segmentation

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Deployment Cloud On-premises End-user BFSI IT and telecom Government and defense Others Geography North America US Europe Germany UK APAC China Japan Middle East and Africa South America

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is expected to experience substantial expansion in the coming years due to the escalating requirement for comprehensive data and identity security in corporations. With the rise in cybercrime and the increasing number of threats from hackers, the demand for sophisticated security solutions is surging. The adoption of cloud security services is gaining traction among large enterprises and small businesses as they transfer an increasing volume of sensitive and confidential data. The proliferation of mobile devices for both professional and personal use is further fueling the demand for cloud security solutions, as these devices are highly susceptible to attacks.

Furthermore, secure Web Gateways provide advanced functionalities such as antivirus, application control, data loss prevention, and HTTPS inspection to protect enterprises from harmful code and potential leaks. The financial sector, including banks, is a significant end user of Secure Web Gateways due to the sensitive nature of the data they handle. The
d
Coresignal | Web Data | Company Data | Global / 71M+ Records / Largest...
datarade.ai
.json, .csv
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coresignal (2024). Coresignal | Web Data | Company Data | Global / 71M+ Records / Largest Professional Network / Updated Daily [Dataset]. https://datarade.ai/data-products/coresignal-web-data-company-data-global-69m-records-coresignal
Explore at:
.json, .csvAvailable download formats
Dataset updated
Mar 1, 2024
Dataset authored and provided by
Coresignal
Area covered
Sweden, Nauru, United Kingdom, State of, Finland, Trinidad and Tobago, Libya, Hong Kong, New Zealand, Yemen
Description
Our Web Data dataset includes such data points as company name, location, headcount, industry, and size, among others. It offers extensive fresh and historical data, including even companies that operate in stealth mode.

For lead generation

With millions of companies worldwide, Web Company Database helps you filter potential clients based on custom criteria and speed up the conversion process.

Use cases

Filter potential clients according to location, size, and other criteria

Enrich your existing database

Improve conversion rates

Use predictive models to identify potential leads

Group your leads in segments for more accurate targeting

For market and business analysis

Our Web Company Data provides information about millions of companies, allowing you to find your competitors and see their weaknesses and strengths.

Use cases

Pinpoint your competitors

Learn about your competitors' size, headcount, and revenue

Prepare a data-driven plan for the next quarter

For Investors

We recommend B2B Web Data for investors to discover and evaluate businesses with the highest potential.

Gain strategic business insights, enhance decision-making, and maintain algorithms that signal investment opportunities with Coresignal’s global B2B Web Dataset.

Use cases

Screen startups and industries showing early signs of growth

Identify companies hungry for the next investment

Check if a startup is about to reach the next maturity phase

Identify and predict a startup's potential at the founding moment

Choose companies that fit you in terms of size and headcount

For sales prospecting

B2B Web Database saves time your employees would otherwise use to search for potential clients manually.

Use cases

Make a short list of the top prospects

Define which companies are large or small enough to buy your product

Based on the revenue, determine which companies are ready to convert

Sort the companies by their distance from your warehouse to draw a line where selling won't result in satisfactory profit
a
Traffic
site-collab-cgvar.hub.arcgis.com
esrifrance.hub.arcgis.com
+1more
Updated Mar 11, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conseil Départemental du Var (2014). Traffic [Dataset]. https://site-collab-cgvar.hub.arcgis.com/datasets/traffic
Explore at:
Dataset updated
Mar 11, 2014
Dataset authored and provided by
Conseil Départemental du Var
License
http://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdfhttp://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdf
Area covered
Description
The map layers in this service provide color-coded maps of the traffic conditions you can expect for the present time (the default). The map shows present traffic as a blend of live and typical information. Live speeds are used wherever available and are established from real-time sensor readings. Typical speeds come from a record of average speeds, which are collected over several weeks within the last year or so. Layers also show current incident locations where available. By changing the map time, the service can also provide past and future conditions. Live readings from sensors are saved for 12 hours, so setting the map time back within 12 hours allows you to see a actual recorded traffic speeds, supplemented with typical averages by default. You can choose to turn off the average speeds and see only the recorded live traffic speeds for any time within the 12-hour window. Predictive traffic conditions are shown for any time in the future.The color-coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation, and field operations. A color-coded traffic map can be requested for the current time and any time in the future. A map for a future request might be used for planning purposes.The map also includes dynamic traffic incidents showing the location of accidents, construction, closures, and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis.Data sourceEsri’s typical speed records and live and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. The real-time and predictive traffic data is updated every five minutes through traffic feeds.Data coverageThe service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. Look at the coverage map to learn whether a country currently supports traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, visit the directions and routing documentation and the ArcGIS Help.SymbologyTraffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%To view live traffic only—that is, excluding typical traffic conditions—enable the Live Traffic layer and disable the Traffic layer. (You can find these layers under World/Traffic > [region] > [region] Traffic). To view more comprehensive traffic information that includes live and typical conditions, disable the Live Traffic layer and enable the Traffic layer.ArcGIS Online organization subscriptionImportant Note:The World Traffic map service is available for users with an ArcGIS Online organizational subscription. To access this map service, you'll need to sign in with an account that is a member of an organizational subscription. If you don't have an organizational subscription, you can create a new account and then sign up for a 30-day trial of ArcGIS Online.
pNEUMA Vision Dataset
zenodo.org
zip
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis; Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis (2023). pNEUMA Vision Dataset [Dataset]. http://doi.org/10.5281/zenodo.7426506
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7426506
Dataset updated
Jan 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis; Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The pNEUMA Vision dataset is the drone traffic imagery dataset that contains images of frame and vehicle annotations as positions. This dataset is the expansion of the pNEUMA, the urban trajectory dataset collected by swarms of drones in Athens.

For more details about pNEUMA and pNEUMA Vision, please check our website at https://open-traffic.epfl.ch and github.
Attitudes towards the internet in Japan 2025
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Attitudes towards the internet in Japan 2025 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Attitudes towards the internet", most Japanese respondents pick "I'm concerned that my data is being misused on the internet" as an answer. 35 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
s
Data from: Traffic Volumes
data.sandiego.gov
Updated Jul 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Traffic Volumes [Dataset]. https://data.sandiego.gov/datasets/traffic-volumes/
Explore at:
csv csv is tabular data. excel, google docs, libreoffice calc or any plain text editor will open files with this format. learn moreAvailable download formats
Dataset updated
Jul 29, 2016
Description
The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.
g
Anonymized web browsing sessions found in the Roma Capitale WiFi system:...
gimi9.com
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Anonymized web browsing sessions found in the Roma Capitale WiFi system: Traffic and language options. Year 2024 | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_c_h501-wifi2024/
Explore at:
Dataset updated
Nov 25, 2024
Description
The dataset describes the anonymized distribution of DigitRoma Wi-fi users, who perform at the offices enabled to the Institutional WiFi service: 1. a specific language option in the configuration of the API of your thin client, used for web browsing, through an authentication to the DigitRoma Wi-fi network; 2. upload-download data traffic with the duration of the session, the date and the place of operation. The information collected refers to the number of anonymized user sessions recorded on a daily basis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:

Dataset updated

Jun 12, 2024

Dataset provided by

Ferrell, Nathan
Chan-Tin, Eric
Moran, Madeline
Homan, Sophia
Soni, Shreena
Honig, Joshua

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Clear search

Close search

Google apps

Main menu

Network Traffic Analysis: Data and Code

Share of global mobile website traffic 2015-2024

Traffic Site

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat...

World Traffic Web Map

Total global visitor traffic to Google.com 2024

Leading websites worldwide 2024, by monthly visits

USA Traffic Counts for Site Selection

Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

Jefferson County KY Traffic Web Cameras

🕵️ Phishing Websites Data

Phishing Websites Dataset

Overview

Dataset Information

Features Description

URL-Based Features

Domain-Based Features

Webpage-Based Features

Behavior-Based Features

Traffic & Search Engine Features

Target Variable

Usage

Indian Traffic Sign Dataset

Indian Traffic Sign Image Dataset

Secure Web Gateway Market Analysis North America, Europe, APAC, Middle East...

Snapshot img

Coresignal | Web Data | Company Data | Global / 71M+ Records / Largest...

Traffic

pNEUMA Vision Dataset

Attitudes towards the internet in Japan 2025

Data from: Traffic Volumes

Anonymized web browsing sessions found in the Roma Capitale WiFi system:...

Network Traffic Analysis: Data and CodeSee More Versions

Network Traffic Analysis: Data and Code