100+ datasets found

Total global visitor traffic to Google.com 2024
statista.com
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
Explore at:
Dataset updated
Aug 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
Total global visitor traffic to YouTube 2017-2021
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total global visitor traffic to YouTube 2017-2021 [Dataset]. https://www.statista.com/statistics/1256702/youtubecom-monthly-visits/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2017 - Apr 2021
Area covered
YouTube, Worldwide
Description
In April 2021, worldwide visits to YouTube.com amounted to roughly ************. Between January 2017 and April 2021, visitor traffic to YouTube.com has increased by more than *** percent. In 2020, visits to the platform's website experienced an upward trend between the months of April and June.
ebay.com total website traffic in 2025, by device
statista.com
Updated Sep 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). ebay.com total website traffic in 2025, by device [Dataset]. https://www.statista.com/statistics/1333492/ebay-website-traffic-total-device/
Explore at:
Dataset updated
Sep 12, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2025 - Aug 2025
Area covered
Worldwide
Description
From March to August 2025, March was the month that had the most website traffic to ebay.com. The consumer-to-consumer (C2C) e-commerce website reached a total of over ****million visits in that month, with the majority being from mobile devices. Popularity on multiple fronts Although eBay is popular on mobile devices, monthly downloads of its mobile app have been trending in the wrong direction since peaking in June 2021. Still, in April 2024, ebay.com was the second most popular e-commerce and shopping website worldwide, accounting for more than ***** percent of visits to sites in this category. Slow and steady In the second quarter of 2023, eBay’s gross merchandise volume (GMV) amounted to nearly **** billion U.S. dollars. That is no small number, but is only a small increase compared to the lowest GMV recorded by the company since the first quarter of 2020 - **** billion U.S. dollars in the third quarter of 2022. Since then, the company's GMV has been on a slow increase. However, while GMV figures begin to achieve steady growth once again, the e-commerce platform's once *** million active buyers have plateaued at *** million.
d
LAcity.org Website Traffic
catalog.data.gov
data.lacity.org
+1more
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). LAcity.org Website Traffic [Dataset]. https://catalog.data.gov/dataset/lacity-org-website-traffic
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.lacity.org
Area covered
Los Angeles
Description
Unique visitors, total sessions, and bounce rate for lacity.org, the main website for the City of Los Angeles.
ulta.com total website traffic 2023-2024, by device
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). ulta.com total website traffic 2023-2024, by device [Dataset]. https://www.statista.com/statistics/1384111/ulta-website-traffic-total-device/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
From November 2023 to April 2024, the total traffic to ulta.com decreased from roughly ** to ** million website visitors. Most users accessed ulta.com via mobile devices in April 2024, making up about ** million website visits. That month, desktops accounted for around ** million website visits.
w
Websites using Visitors Traffic Real Time Statistics
webtechsurvey.com
csv
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites using Visitors Traffic Real Time Statistics [Dataset]. https://webtechsurvey.com/technology/visitors-traffic-real-time-statistics
Explore at:
csvAvailable download formats
Dataset updated
Jan 15, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites using the Visitors Traffic Real Time Statistics technology, compiled through global website indexing conducted by WebTechSurvey.
Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Ferrell, Nathan
Moran, Madeline
Chan-Tin, Eric
Soni, Shreena
Honig, Joshua
Homan, Sophia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Zalando: website traffic on desktop and mobile across all domains 2023
statista.com
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Zalando: website traffic on desktop and mobile across all domains 2023 [Dataset]. https://www.statista.com/statistics/1175856/zalando-website-traffic-all-domains/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023 - Dec 2023
Area covered
Europe
Description
Desktop and mobile website traffic data showed that Germany domain of Zalando had by far the highest number of visitors compared to all other European countries. Between July 2023 and December 2023, zalando.de recorded more nearly *** million visits. The Polish web domain followed in the ranking, as the total visits amounted to **** million.
Data from: Analysis of the Quantitative Impact of Social Networks General...
figshare.com
produccioncientifica.ucm.es
doc
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21329421.v1
Dataset updated
Oct 14, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
a
Web Traffic Data | 500M+ US Web Traffic Data Resolution | B2B and B2C...
data.allforce.io
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allforce (2024). Web Traffic Data | 500M+ US Web Traffic Data Resolution | B2B and B2C Website Visitor Identity Resolution [Dataset]. https://data.allforce.io/products/web-traffic-data-500m-us-web-traffic-data-resolution-b2b-allforce
Explore at:
Dataset updated
Feb 22, 2024
Dataset authored and provided by
Allforce
Area covered
United States
Description
Capture the Identity of Your Anonymous Web Visitors.

500M B2B & B2C Hash Email Resolutions

That Day's Contacts Deposited to an FTP Every Night

Full Contact Details for US Web Traffic Data Resolution
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
g
Food Hygiene Ratings website traffic
fsadata.github.io
data.europa.eu
+1more
csv
Updated Jan 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Food Hygiene Ratings website traffic [Dataset]. https://fsadata.github.io/food-hygiene-ratings-website-traffic/
Explore at:
csvAvailable download formats
Dataset updated
Jan 16, 2020
Description
A breakdown of the Food Hygiene Ratings website traffic data showing total number of visits, unique visitors and page views.
g
Website Metrics
gimi9.com
datasets.ai
+3more
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Website Metrics [Dataset]. https://gimi9.com/dataset/data-gov_website-metrics/
Explore at:
Dataset updated
Apr 1, 2025
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
f
Study on the impact of Google AI Overview by sector – 2025
falia.co
html
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Falia (2025). Study on the impact of Google AI Overview by sector – 2025 [Dataset]. https://falia.co/
Explore at:
htmlAvailable download formats
Dataset updated
Aug 18, 2025
Dataset authored and provided by
Falia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study compiles data from multiple sources (Ahrefs, Amsive, Pew Research, Semrush) to evaluate the impact of Google AI Overview on website traffic across different sectors.
d
The total number of visitors to the micro-enterprise website.
data.gov.tw
csv, json +2
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Workforce Development Agency, MOL (2025). The total number of visitors to the micro-enterprise website. [Dataset]. https://data.gov.tw/en/datasets/36178
Explore at:
csv, xml, json, webservicesAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Workforce Development Agency, MOL
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The cumulative number of visitors to the micro-enterprise phoenix website.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Total web visits to D2C brands' websites 2021
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total web visits to D2C brands' websites 2021 [Dataset]. https://www.statista.com/statistics/1372926/d2c-brands-by-web-traffic-worlddwide/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
Worldwide
Description
In 2021, Chewy was the direct-to-consumer (D2C) brand with most online traffic, hitting *** million visits worldwide. Opensea ranked second with *** million online visits, followed by Fitbit at *** million visits.
Global Network Traffic Analytics Market 2018-2022
technavio.com
pdf
Updated Jun 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2018). Global Network Traffic Analytics Market 2018-2022 [Dataset]. https://www.technavio.com/report/global-network-traffic-analytics-market-analysis-share-2018
Explore at:
pdfAvailable download formats
Dataset updated
Jun 21, 2018
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Description
Snapshot img

Global network traffic analytics Industry Overview

Technavio’s analysts have identified the increasing use of network traffic analytics solutions to be one of major factors driving market growth. With the rapidly changing IT infrastructure, security hackers can steal valuable information through various modes. With the increasing dependence on web applications and websites for day-to-day activities and financial transactions, the instances of theft have increased globally. Also, the emergence of social networking websites has aided the malicious attackers to extract valuable information from vulnerable users. The increasing consumer dependence on web applications and websites for day-to-day activities and financial transactions are further increasing the risks of theft. This encourages the organizations to adopt network traffic analytics solutions.

Want a bigger picture? Try a FREE sample of this report now!

See the complete table of contents and list of exhibits, as well as selected illustrations and example pages from this report.

Companies covered

The network traffic analytics market is fairly concentrated due to the presence of few established companies offering innovative and differentiated software and services. By offering a complete analysis of the competitiveness of the players in the network monitoring tools market offering varied software and services, this network traffic analytics industry analysis report will aid clients identify new growth opportunities and design new growth strategies.

The report offers a complete analysis of a number of companies including:

Allot Cisco Systems IBM Juniper Networks Microsoft Symantec

Network traffic analytics market growth based on geographic regions

Americas APAC EMEA

With a complete study of the growth opportunities for the companies across regions such as the Americas, APAC, and EMEA, our industry research analysts have estimated that countries in the Americas will contribute significantly to the growth of the network monitoring tools market throughout the predicted period.

Network traffic analytics market growth based on end-user

Telecom BFSI Healthcare Media and entertainment

According to our market research experts, the telecom end-user industry will be the major end-user of the network monitoring tools market throughout the forecast period. Factors such as increasing use of network traffic analytics solutions and increasing use of mobile devices at workplaces will contribute to the growth of the market shares of the telecom industry in the network traffic analytics market.

Key highlights of the global network traffic analytics market for the forecast years 2018-2022:

CAGR of the market during the forecast period 2018-2022 Detailed information on factors that will accelerate the growth of the network traffic analytics market during the next five years Precise estimation of the global network traffic analytics market size and its contribution to the parent market Accurate predictions on upcoming trends and changes in consumer behavior Growth of the network traffic analytics industry across various geographies such as the Americas, APAC, and EMEA A thorough analysis of the market’s competitive landscape and detailed information on several vendors Comprehensive information about factors that will challenge the growth of network traffic analytics companies

Get more value with Technavio’s INSIGHTS subscription platform! Gain easy access to all of Technavio’s reports, along with on-demand services. Try the demo

This market research report analyzes the market outlook and provides a list of key trends, drivers, and challenges that are anticipated to impact the global network traffic analytics market and its stakeholders over the forecast years.

The global network traffic analytics market analysts at Technavio have also considered how the performance of other related markets in the vertical will impact the size of this market till 2022. Some of the markets most likely to influence the growth of the network traffic analytics market over the coming years are the Global Network as a Service Market and the Global Data Analytics Outsourcing Market.

Technavio’s collection of market research reports offer insights into the growth of markets across various industries. Additionally, we also provide customized reports based on the specific requirement of our clients.
Multilingual Scraper of Privacy Policies and Terms of Service
zenodo.org
bin, zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold (2025). Multilingual Scraper of Privacy Policies and Terms of Service [Dataset]. http://doi.org/10.5281/zenodo.14562039
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14562039
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

This dataset supplements publication "Multilingual Scraper of Privacy Policies and Terms of Service" at ACM CSLAW’25, March 25–27, 2025, München, Germany. It includes the first 12 months of scraped policies and terms from about 800k websites, see concrete numbers below.

The following table lists the amount of websites visited per month:

Month Number of websites
2024-01 551'148
2024-02 792'921
2024-03 844'537
2024-04 802'169
2024-05 805'878
2024-06 809'518
2024-07 811'418
2024-08 813'534
2024-09 814'321
2024-10 817'586
2024-11 828'662
2024-12 827'101

The amount of websites visited should always be higher than the number of jobs (Table 1 of the paper) as a website may redirect, resulting in two websites scraped or it has to be retried.

To simplify the access, we release the data in large CSVs. Namely, there is one file for policies and another for terms per month. All of these files contain all metadata that are usable for the analysis. If your favourite CSV parser reports the same numbers as above then our dataset is correctly parsed. We use ‘,’ as a separator, the first row is the heading and strings are in quotes.

Since our scraper sometimes collects other documents than policies and terms (for how often this happens, see the evaluation in Sec. 4 of the publication) that might contain personal data such as addresses of authors of websites that they maintain only for a selected audience. We therefore decided to reduce the risks for websites by anonymizing the data using Presidio. Presidio substitutes personal data with tokens. If your personal data has not been effectively anonymized from the database and you wish for it to be deleted, please contact us.

Preliminaries

The uncompressed dataset is about 125 GB in size, so you will need sufficient storage. This also means that you likely cannot process all the data at once in your memory, so we split the data in months and in files for policies and terms.

Files and structure

The files have the following names:

2024_policy.csv for policies

2024_terms.csv for terms

Shared metadata

Both files contain the following metadata columns:

website_month_id - identification of crawled website

job_id - one website can have multiple jobs in case of redirects (but most commonly has only one)

website_index_status - network state of loading the index page. This is resolved by the Chromed DevTools Protocol.

DNS_ERROR - domain cannot be resolved

OK - all fine

REDIRECT - domain redirect to somewhere else

TIMEOUT - the request timed out

BAD_CONTENT_TYPE - 415 Unsupported Media Type

HTTP_ERROR - 404 error

TCP_ERROR - error in the network connection

UNKNOWN_ERROR - unknown error

website_lang - language of index page detected based on langdetect library

website_url - the URL of the website sampled from the CrUX list (may contain subdomains, etc). Use this as a unique identifier for connecting data between months.

job_domain_status - indicates the status of loading the index page. Can be:

OK - all works well (at the moment, should be all entries)

BLACKLISTED - URL is on our list of blocked URLs

UNSAFE - website is not safe according to save browsing API by Google

LOCATION_BLOCKED - country is in the list of blocked countries

job_started_at - when the visit of the website was started

job_ended_at - when the visit of the website was ended

job_crux_popularity - JSON with all popularity ranks of the website this month

job_index_redirect - when we detect that the domain redirects us, we stop the crawl and create a new job with the target URL. This saves time if many websites redirect to one target, as it will be crawled only once. The index_redirect is then the job.id corresponding to the redirect target.

job_num_starts - amount of crawlers that started this job (counts restarts in case of unsuccessful crawl, max is 3)

job_from_static - whether this job was included in the static selection (see Sec. 3.3 of the paper)

job_from_dynamic - whether this job was included in the dynamic selection (see Sec. 3.3 of the paper) - this is not exclusive with from_static - both can be true when the lists overlap.

job_crawl_name - our name of the crawl, contains year and month (e.g., 'regular-2024-12' for regular crawls, in Dec 2024)

Policy data

policy_url_id - ID of the URL this policy has

policy_keyword_score - score (higher is better) according to the crawler's keywords list that given document is a policy

policy_ml_probability - probability assigned by the BERT model that given document is a policy

policy_consideration_basis - on which basis we decided that this url is policy. The following three options are executed by the crawler in this order:

'keyword matching' - this policy was found using the crawler navigation (which is based on keywords)

'search' - this policy was found using search engine

'path guessing' - this policy was found by using well-known URLs like example.com/policy

policy_url - full URL to the policy

policy_content_hash - used as identifier - if the document remained the same between crawls, it won't create a new entry

policy_content - contains the text of policies and terms extracted to Markdown using Mozilla's readability library

policy_lang - Language detected by fasttext of the content

Terms data

Analogous to policy data, just substitute policy to terms.

Updates

Check this Google Docs for an updated version of this README.md.

Month	Number of websites
2024-01	551'148
2024-02	792'921
2024-03	844'537
2024-04	802'169
2024-05	805'878
2024-06	809'518
2024-07	811'418
2024-08	813'534
2024-09	814'321
2024-10	817'586
2024-11	828'662
2024-12	827'101

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/

Total global visitor traffic to Google.com 2024

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Aug 20, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Oct 2023 - Mar 2024

Area covered

Worldwide

Description

In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.

Clear search

Close search

Google apps

Main menu

Total global visitor traffic to Google.com 2024

Total global visitor traffic to YouTube 2017-2021

ebay.com total website traffic in 2025, by device

LAcity.org Website Traffic

ulta.com total website traffic 2023-2024, by device

Websites using Visitors Traffic Real Time Statistics

Network Traffic Analysis: Data and Code

Zalando: website traffic on desktop and mobile across all domains 2023

Data from: Analysis of the Quantitative Impact of Social Networks General...

Web Traffic Data | 500M+ US Web Traffic Data Resolution | B2B and B2C...

Network Traffic Dataset

Website Statistics

Food Hygiene Ratings website traffic

Website Metrics

Study on the impact of Google AI Overview by sector – 2025

The total number of visitors to the micro-enterprise website.

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Total web visits to D2C brands' websites 2021

Global Network Traffic Analytics Market 2018-2022

Snapshot img

Multilingual Scraper of Privacy Policies and Terms of Service

Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

Preliminaries

Files and structure

Shared metadata

Policy data

Terms data

Updates

Total global visitor traffic to Google.com 2024