55 datasets found
  1. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Ferrell, Nathan
    Chan-Tin, Eric
    Moran, Madeline
    Homan, Sophia
    Soni, Shreena
    Honig, Joshua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  2. Share of global mobile website traffic 2015-2024

    • statista.com
    • ai-chatbox.pro
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of global mobile website traffic 2015-2024 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
    Explore at:
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.

  3. a

    Traffic Site

    • hub.arcgis.com
    • data-waikatolass.opendata.arcgis.com
    • +1more
    Updated Sep 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamilton City Council (2021). Traffic Site [Dataset]. https://hub.arcgis.com/maps/hcc::traffic-site
    Explore at:
    Dataset updated
    Sep 9, 2021
    Dataset authored and provided by
    Hamilton City Council
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Attributes of sites in Hamilton City which collect anonymised data from a sample of vehicles. Note: A Link is the section of the road between two sites

    Column_InfoSite_Id, int : Unique identiferNumber, int : Asset number. Note: If the site is at a signalised intersection, Number will match 'Site_Number' in the table 'Traffic Signal Site Location'Is_Enabled, varchar : Site is currently enabledDisabled_Date, datetime : If currently disabled, the date at which the site was disabledSite_Name, varchar : Description of the site locationLatitude, numeric : North-south geographic coordinatesLongitude, numeric : East-west geographic coordinates

    Relationship
    
    
    
    
    
    
    
    
    
    Disclaimer
    
    Hamilton City Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data released for public download. Levels, locations and dimensions of works depicted in the data may not be accurate due to circumstances not notified to Council. A physical check should be made on all levels, locations and dimensions before starting design or works.
    
    Hamilton City Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data.
    
    While you are free to crop, export and re-purpose the data, we ask that you attribute the Hamilton City Council and clearly state that your work is a derivative and not the authoritative data source. Please include the following statement when distributing any work derived from this data:
    
    ‘This work is derived entirely or in part from Hamilton City Council data; the provided information may be updated at any time, and may at times be out of date, inaccurate, and/or incomplete.'
    
  4. d

    Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

    • datarade.ai
    .csv, .xls
    Updated Jun 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset authored and provided by
    Swash
    Area covered
    Uzbekistan, Saint Vincent and the Grenadines, Monaco, Latvia, Jordan, Belarus, Jamaica, Liechtenstein, Russian Federation, India
    Description

    Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

    Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

    User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

    Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

    GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

    Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

    High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

    Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

    Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.

  5. Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat...

    • verifiedmarketresearch.com
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat Map Analytics), By Application (Social Media Management, Display Advertising Optimization), By Vertical (Baking, Financial Services And Insurance (BFSI), Retail), And Region for 2026-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/web-analytics-market/
    Explore at:
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Web Analytics Market was valued at USD 6.16 Billion in 2024 and is projected to reach USD 13.6 Billion by 2032, growing at a CAGR of 18.58% from 2026 to 2032.

    Web Analytics Market Drivers

    Data-Driven Decision Making: Businesses increasingly rely on data-driven insights to optimize their online strategies. Web analytics provides valuable data on website traffic, user behavior, and conversion rates, enabling data-driven decision-making.

    E-commerce Growth: The rapid growth of e-commerce has fueled the demand for web analytics tools to track online sales, customer behavior, and marketing campaign effectiveness.

    Mobile Dominance: The increasing use of mobile devices for internet browsing has made mobile analytics a crucial aspect of web analytics. Businesses need to understand how users interact with their websites and apps on mobile devices.

    analytics tools can be complex to implement and use, requiring technical expertise.

  6. World Traffic Web Map

    • walmart-event-collaboration-portal-walmarttech.hub.arcgis.com
    Updated Jun 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walmart Emergency Management (2021). World Traffic Web Map [Dataset]. https://walmart-event-collaboration-portal-walmarttech.hub.arcgis.com/maps/world-traffic-web-map
    Explore at:
    Dataset updated
    Jun 18, 2021
    Dataset provided by
    Walmarthttp://walmart.com/
    Authors
    Walmart Emergency Management
    Area covered
    Description

    This is a dynamic traffic map service with capabilities for visualizing traffic speeds relative to free-flow speeds as well as traffic incidents which can be visualized and identified. The traffic data is updated every five minutes. Traffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%Esri's historical, live, and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. Historical traffic is based on the average of observed speeds over the past three years. The live and predictive traffic data is updated every five minutes through traffic feeds. The color coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation and field operations. The traffic map layer contains two sublayers: Traffic and Live Traffic. The Traffic sublayer (shown by default) leverages historical, live and predictive traffic data; while the Live Traffic sublayer is calculated from just the live and predictive traffic data only. A color coded traffic map image can be requested for the current time and any time in the future. A map image for a future request might be used for planning purposes. The map layer also includes dynamic traffic incidents showing the location of accidents, construction, closures and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis. The service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. In the coverage map, the countries color coded in dark green support visualizing live traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, including a data coverage map, visit the directions and routing documentation and ArcGIS Help.

  7. Total global visitor traffic to Google.com 2024

    • statista.com
    • ai-chatbox.pro
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.

  8. Leading websites worldwide 2024, by monthly visits

    • statista.com
    • ai-chatbox.pro
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2024
    Area covered
    Worldwide
    Description

    In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

  9. USA Traffic Counts for Site Selection

    • hub.arcgis.com
    Updated Jun 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2016). USA Traffic Counts for Site Selection [Dataset]. https://hub.arcgis.com/datasets/07bf63e8238b44e7ba44cdcadcc5a8c2
    Explore at:
    Dataset updated
    Jun 21, 2016
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    United States
    Description

    To check traffic counts around a potential business location simply enter an address on the top bar. The application will draw a one mile circle around the location and provide a list of traffic count points. You may also click anywhere on the map to drop a point. Then click the point or the graphic on the right to reveal a pop up with:The most recent traffic countThe count type (see the methodology document for definitions)A graph showing up to the last five available traffic counts at that locationThe large circled number in the side panel displays the number of points within the one mile radius. Under the circle there is a slide bar that can enlarge the selection area up to 10 miles. Under the slide bar, each point is displayed and clicking here will also reveal the pop up.Additional Esri Resources:U.S. Traffic Count and Methodology2016 Traffic Counts in the United States web mapBusiness Data Summary and MethodologyUpdated Demographics and MethodologyEsri's arcgis.com demographic map layers

  10. o

    Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Luxemburk; Karel Hynek; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška (2022). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. http://doi.org/10.5281/zenodo.7409923
    Explore at:
    Dataset updated
    Dec 7, 2022
    Authors
    Jan Luxemburk; Karel Hynek; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška
    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size: W-2022-44 Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45 Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46 Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47 Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22 Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list belo...

  11. d

    Jefferson County KY Traffic Web Cameras

    • catalog.data.gov
    • data.lojic.org
    • +5more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Jefferson County KY Traffic Web Cameras [Dataset]. https://catalog.data.gov/dataset/jefferson-county-ky-traffic-web-cameras-2b335
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Jefferson County, Kentucky
    Description

    TRIMARC (Traffic Response and Incident Management Assisting the River City) camera locations in Louisville Metro Kentucky. This feature layer was created from a TRIMARC JSON files of camera locations. This item includes description, direction, and videos links and is used in the Louisville Metro Snow Map. The cameras are used to monitor the roadways and verify incidents to assist in freeway and incident management This feature is a static extract and will be reviewed before each snow season for updates. For more information on this feature layer and it's use please contact Louisville Metro GIS or LOJIC. To learn more about TRIMARC please visit the following website http://www.trimarc.org.

  12. 🕵️ Phishing Websites Data

    • kaggle.com
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sairaj Adhav (2025). 🕵️ Phishing Websites Data [Dataset]. https://www.kaggle.com/datasets/sai10py/phishing-websites-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sairaj Adhav
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Phishing Websites Dataset

    Overview

    This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.

    Dataset Information

    • Total Columns: 31 (30 Features + 1 Target)
    • Target Variable: Result (Indicates whether a website is phishing or legitimate)

    Features Description

    URL-Based Features

    • Prefix_Suffix – Checks if the URL contains a hyphen (-), which is commonly used in phishing domains.
    • double_slash_redirecting – Detects if the URL redirects using //, which may indicate a phishing attempt.
    • having_At_Symbol – Identifies the presence of @ in the URL, which can be used to deceive users.
    • Shortining_Service – Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl).
    • URL_Length – Measures the length of the URL; phishing URLs tend to be longer.
    • having_IP_Address – Checks if an IP address is used in place of a domain name, which is suspicious.

    Domain-Based Features

    • having_Sub_Domain – Evaluates the number of subdomains; phishing sites often have excessive subdomains.
    • SSLfinal_State – Indicates whether the website has a valid SSL certificate (secure connection).
    • Domain_registeration_length – Measures the duration of domain registration; phishing sites often have short lifespans.
    • age_of_domain – The age of the domain in days; older domains are usually more trustworthy.
    • DNSRecord – Checks if the domain has valid DNS records; phishing domains may lack these.

    Webpage-Based Features

    • Favicon – Determines if the website uses an external favicon (which can be a sign of phishing).
    • port – Identifies if the site is using suspicious or non-standard ports.
    • HTTPS_token – Checks if "HTTPS" is included in the URL but is used deceptively.
    • Request_URL – Measures the percentage of external resources loaded from different domains.
    • URL_of_Anchor – Analyzes anchor tags (<a> links) and their trustworthiness.
    • Links_in_tags – Examines <meta>, <script>, and <link> tags for external links.
    • SFH (Server Form Handler) – Determines if form actions are handled suspiciously.
    • Submitting_to_email – Checks if forms submit data directly to an email instead of a web server.
    • Abnormal_URL – Identifies if the website’s URL structure is inconsistent with common patterns.
    • Redirect – Counts the number of redirects; phishing websites may have excessive redirects.

    Behavior-Based Features

    • on_mouseover – Checks if the website changes content when hovered over (used in deceptive techniques).
    • RightClick – Detects if right-click functionality is disabled (phishing sites may disable it).
    • popUpWindow – Identifies the presence of pop-ups, which can be used to trick users.
    • Iframe – Checks if the website uses <iframe> tags, often used in phishing attacks.

    Traffic & Search Engine Features

    • web_traffic – Measures the website’s Alexa ranking; phishing sites tend to have low traffic.
    • Page_Rank – Google PageRank score; phishing sites usually have a low PageRank.
    • Google_Index – Checks if the website is indexed by Google (phishing sites may not be indexed).
    • Links_pointing_to_page – Counts the number of backlinks pointing to the website.
    • Statistical_report – Uses external sources to verify if the website has been reported for phishing.

    Target Variable

    • Result – The classification label (1: Legitimate, -1: Phishing)

    Usage

    This dataset is valuable for:
    Machine Learning Models – Developing classifiers for phishing detection.
    Cybersecurity Research – Understanding patterns in phishing attacks.
    Browser Security Extensions – Enhancing anti-phishing tools.

  13. R

    Indian Traffic Sign Dataset

    • universe.roboflow.com
    zip
    Updated Sep 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataCluster Labs (2023). Indian Traffic Sign Dataset [Dataset]. https://universe.roboflow.com/datacluster-labs-agryi/indian-traffic-sign-vvx9y
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset authored and provided by
    DataCluster Labs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Signals Bounding Boxes
    Description

    Indian Traffic Sign Image Dataset

    Datasets for Indian traffic signs

    About Dataset

    **This dataset is collected by Datacluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: s*ales@datacluster.ai* **

    This dataset is an extremely challenging set of over 2000+ original Indian Traffic Sign images captured and crowdsourced from over 400+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at DC Labs.

    Dataset Features 1. Dataset size : 2000+ 2. Captured by : Over 400+ crowdsource contributors 3. Resolution : 100% of images HD and above (1920x1080 and above) 4. Location : Captured with 400+ cities accross India 5. Diversity : Various lighting conditions like day, night, varied distances, view points etc. 6. Device used : Captured using mobile phones in 2020-2021 7. Usage : Traffic sign detection, Self-driving systems, traffic detection, sign detection, etc.

    Available Annotation formats COCO, YOLO, PASCAL-VOC, Tf-Record

    The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

  14. Secure Web Gateway Market Analysis North America, Europe, APAC, Middle East...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Secure Web Gateway Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/secure-web-gateway-market-industry-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Germany, United Kingdom, United States, Global
    Description

    Snapshot img

    Secure Web Gateway Market Size 2024-2028

    The secure web gateway market size is forecast to increase by USD 19.45 billion at a CAGR of 26.79% between 2023 and 2028. The market is experiencing significant growth due to the increasing number of online security threats targeting web products and websites. Traditional firewalls once considered a cyber barrier, are no longer sufficient to protect against unauthorized traffic, malicious websites, and data leakage.

    To address these challenges, secure web gateways employ a 7-layered traffic inspection approach, providing application-level control and data leakage prevention. As more companies adopt cloud-based solutions and allow remote workers, the need for advanced web security solutions becomes increasingly important. The rising adoption of cloud-based security technologies is a major market growth factor, despite the high implementation costs.

    What will be the size of the Secure Web Gateway Market During the Forecast Period?

    Request Free Sample

    The cyber threat environment continues to evolve, with system viruses and unknown spyware posing significant risks to both individual and organizational data. Unsecured communication channels and malicious web traffic are common avenues for cyber-attacks, making it crucial for businesses to implement strong security solutions. A secure web gateway acts as a cyber barrier, safeguarding against online security threats and protecting against unauthorized traffic, malware attacks, and data breaches. Malicious web traffic, including viruses, malware, and harmful websites, can infiltrate an internal network through unsecured endpoints.

    Moreover, trojan horses, adware, and spyware are common threats that can compromise individual data and organizational data. Remote employees working from home present an additional challenge, as they may access the company network through unsecured Wi-Fi or unprotected devices. A secure web gateway acts as a centralized filtering system, controlling web requests and blocking access to malicious websites. It provides an essential layer of security, preventing unauthorized access to sensitive data and protecting against known and unknown threats. By implementing a secure web gateway, companies can enforce company policy, ensuring that all web traffic is secure and compliant. Online security threats are a constant concern for businesses, with data breaches and malware attacks becoming increasingly common.

    Furthermore, a secure web gateway helps mitigate these risks by providing a comprehensive solution for securing end-user data and web security. It blocks malicious web traffic, preventing the spread of viruses, malware, and other harmful software. By implementing a secure web gateway, businesses can protect their valuable data and maintain their online reputation. In conclusion, a secure web gateway is an essential component of any organization's cybersecurity strategy. It provides a critical layer of protection against online security threats, including malicious web traffic, viruses, malware, and harmful websites. By implementing a secure web gateway, businesses can safeguard their data, protect against unauthorized traffic, and maintain compliance with industry regulations.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    End-user
    
      BFSI
      IT and telecom
      Government and defense
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Japan
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period. The market is expected to experience substantial expansion in the coming years due to the escalating requirement for comprehensive data and identity security in corporations. With the rise in cybercrime and the increasing number of threats from hackers, the demand for sophisticated security solutions is surging. The adoption of cloud security services is gaining traction among large enterprises and small businesses as they transfer an increasing volume of sensitive and confidential data. The proliferation of mobile devices for both professional and personal use is further fueling the demand for cloud security solutions, as these devices are highly susceptible to attacks.

    Furthermore, secure Web Gateways provide advanced functionalities such as antivirus, application control, data loss prevention, and HTTPS inspection to protect enterprises from harmful code and potential leaks. The financial sector, including banks, is a significant end user of Secure Web Gateways due to the sensitive nature of the data they handle. The

  15. d

    Coresignal | Web Data | Company Data | Global / 71M+ Records / Largest...

    • datarade.ai
    .json, .csv
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coresignal (2024). Coresignal | Web Data | Company Data | Global / 71M+ Records / Largest Professional Network / Updated Daily [Dataset]. https://datarade.ai/data-products/coresignal-web-data-company-data-global-69m-records-coresignal
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    Coresignal
    Area covered
    Sweden, Nauru, United Kingdom, State of, Finland, Trinidad and Tobago, Libya, Hong Kong, New Zealand, Yemen
    Description

    Our Web Data dataset includes such data points as company name, location, headcount, industry, and size, among others. It offers extensive fresh and historical data, including even companies that operate in stealth mode.

    For lead generation

    With millions of companies worldwide, Web Company Database helps you filter potential clients based on custom criteria and speed up the conversion process.

    Use cases

    1. Filter potential clients according to location, size, and other criteria
    2. Enrich your existing database
    3. Improve conversion rates
    4. Use predictive models to identify potential leads
    5. Group your leads in segments for more accurate targeting

    For market and business analysis

    Our Web Company Data provides information about millions of companies, allowing you to find your competitors and see their weaknesses and strengths.

    Use cases

    1. Pinpoint your competitors
    2. Learn about your competitors' size, headcount, and revenue
    3. Prepare a data-driven plan for the next quarter

    For Investors

    We recommend B2B Web Data for investors to discover and evaluate businesses with the highest potential.

    Gain strategic business insights, enhance decision-making, and maintain algorithms that signal investment opportunities with Coresignal’s global B2B Web Dataset.

    Use cases

    1. Screen startups and industries showing early signs of growth
    2. Identify companies hungry for the next investment
    3. Check if a startup is about to reach the next maturity phase
    4. Identify and predict a startup's potential at the founding moment
    5. Choose companies that fit you in terms of size and headcount

    For sales prospecting

    B2B Web Database saves time your employees would otherwise use to search for potential clients manually.

    Use cases

    1. Make a short list of the top prospects
    2. Define which companies are large or small enough to buy your product
    3. Based on the revenue, determine which companies are ready to convert
    4. Sort the companies by their distance from your warehouse to draw a line where selling won't result in satisfactory profit
  16. a

    Traffic

    • site-collab-cgvar.hub.arcgis.com
    • esrifrance.hub.arcgis.com
    • +1more
    Updated Mar 11, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conseil Départemental du Var (2014). Traffic [Dataset]. https://site-collab-cgvar.hub.arcgis.com/datasets/traffic
    Explore at:
    Dataset updated
    Mar 11, 2014
    Dataset authored and provided by
    Conseil Départemental du Var
    License

    http://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdfhttp://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdf

    Area covered
    Description

    The map layers in this service provide color-coded maps of the traffic conditions you can expect for the present time (the default). The map shows present traffic as a blend of live and typical information. Live speeds are used wherever available and are established from real-time sensor readings. Typical speeds come from a record of average speeds, which are collected over several weeks within the last year or so. Layers also show current incident locations where available. By changing the map time, the service can also provide past and future conditions. Live readings from sensors are saved for 12 hours, so setting the map time back within 12 hours allows you to see a actual recorded traffic speeds, supplemented with typical averages by default. You can choose to turn off the average speeds and see only the recorded live traffic speeds for any time within the 12-hour window. Predictive traffic conditions are shown for any time in the future.The color-coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation, and field operations. A color-coded traffic map can be requested for the current time and any time in the future. A map for a future request might be used for planning purposes.The map also includes dynamic traffic incidents showing the location of accidents, construction, closures, and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis.Data sourceEsri’s typical speed records and live and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. The real-time and predictive traffic data is updated every five minutes through traffic feeds.Data coverageThe service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. Look at the coverage map to learn whether a country currently supports traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, visit the directions and routing documentation and the ArcGIS Help.SymbologyTraffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%To view live traffic only—that is, excluding typical traffic conditions—enable the Live Traffic layer and disable the Traffic layer. (You can find these layers under World/Traffic > [region] > [region] Traffic). To view more comprehensive traffic information that includes live and typical conditions, disable the Live Traffic layer and enable the Traffic layer.ArcGIS Online organization subscriptionImportant Note:The World Traffic map service is available for users with an ArcGIS Online organizational subscription. To access this map service, you'll need to sign in with an account that is a member of an organizational subscription. If you don't have an organizational subscription, you can create a new account and then sign up for a 30-day trial of ArcGIS Online.

  17. pNEUMA Vision Dataset

    • zenodo.org
    zip
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis; Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis (2023). pNEUMA Vision Dataset [Dataset]. http://doi.org/10.5281/zenodo.7426506
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis; Sohyeong Kim; Georg Anagnostopoulos; Emmanouil Barmpounakis; Nikolas Geroliminis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The pNEUMA Vision dataset is the drone traffic imagery dataset that contains images of frame and vehicle annotations as positions. This dataset is the expansion of the pNEUMA, the urban trajectory dataset collected by swarms of drones in Athens.

    For more details about pNEUMA and pNEUMA Vision, please check our website at https://open-traffic.epfl.ch and github.

  18. Attitudes towards the internet in Japan 2025

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Japan 2025 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Japanese respondents pick "I'm concerned that my data is being misused on the internet" as an answer. 35 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.

  19. s

    Data from: Traffic Volumes

    • data.sandiego.gov
    Updated Jul 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Traffic Volumes [Dataset]. https://data.sandiego.gov/datasets/traffic-volumes/
    Explore at:
    csv csv is tabular data. excel, google docs, libreoffice calc or any plain text editor will open files with this format. learn moreAvailable download formats
    Dataset updated
    Jul 29, 2016
    Description

    The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.

  20. g

    Anonymized web browsing sessions found in the Roma Capitale WiFi system:...

    • gimi9.com
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Anonymized web browsing sessions found in the Roma Capitale WiFi system: Traffic and language options. Year 2024 | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_c_h501-wifi2024/
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The dataset describes the anonymized distribution of DigitRoma Wi-fi users, who perform at the offices enabled to the Institutional WiFi service: 1. a specific language option in the configuration of the API of your thin client, used for web browsing, through an authentication to the DigitRoma Wi-fi network; 2. upload-download data traffic with the duration of the session, the date and the place of operation. The information collected refers to the number of anonymized user sessions recorded on a daily basis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Ferrell, Nathan
Chan-Tin, Eric
Moran, Madeline
Homan, Sophia
Soni, Shreena
Honig, Joshua
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Search
Clear search
Close search
Google apps
Main menu