Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
Facebook
TwitterIn March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
Facebook
TwitterThis file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
Facebook
TwitterIn April 2021, worldwide visits to YouTube.com amounted to roughly ************. Between January 2017 and April 2021, visitor traffic to YouTube.com has increased by more than *** percent. In 2020, visits to the platform's website experienced an upward trend between the months of April and June.
Facebook
TwitterFrom November 2023 to April 2024, the total traffic to ulta.com decreased from roughly ** to ** million website visitors. Most users accessed ulta.com via mobile devices in April 2024, making up about ** million website visits. That month, desktops accounted for around ** million website visits.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.
This dataset can be used for various analyses such as:
This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.
Facebook
TwitterFrom March to August 2025, March was the month that had the most website traffic to ebay.com. The consumer-to-consumer (C2C) e-commerce website reached a total of over ****million visits in that month, with the majority being from mobile devices. Popularity on multiple fronts Although eBay is popular on mobile devices, monthly downloads of its mobile app have been trending in the wrong direction since peaking in June 2021. Still, in April 2024, ebay.com was the second most popular e-commerce and shopping website worldwide, accounting for more than ***** percent of visits to sites in this category. Slow and steady In the second quarter of 2023, eBay’s gross merchandise volume (GMV) amounted to nearly **** billion U.S. dollars. That is no small number, but is only a small increase compared to the lowest GMV recorded by the company since the first quarter of 2020 - **** billion U.S. dollars in the third quarter of 2022. Since then, the company's GMV has been on a slow increase. However, while GMV figures begin to achieve steady growth once again, the e-commerce platform's once *** million active buyers have plateaued at *** million.
Facebook
Twitter
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Visitors Traffic Real Time Statistics technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains meticulously cleaned and structured web traffic data collected across multiple websites, including Amazon platforms and services like Amazon Prime, AWS, and AWS Support. It spans various traffic sources, user devices, key actions, and engagement metrics, making it a powerful resource for digital marketing analysis, customer behavior modeling, and time series forecasting.
Ideal for:
Web traffic analysis Conversion rate optimization Bounce rate analysis User segmentation Predictive modeling and machine learning 📌 Dataset Features: Rows: 2006 Columns: 18
Date Range: Starts from January 1st, 2019 (Exact end date can be inferred from the dataset)
🔍 Columns Overview: Country: Country of user origin
Timestamp: Full timestamp of the visit Device Category: Type of device (Desktop, Mobile, Tablet) Key Actions: User actions like Purchase, Sign Up, Subscribe Page Path: Visited page (e.g., /home, /contact) Source: Traffic source (e.g., organic search, social media) Avg Session Duration: Duration of session in seconds Bounce Rate: % of single-page sessions Conversions: Number of conversions New Users: Number of new users in session Page Views: Total page views Returning Users: Count of returning users Unique Page Views: Unique page views Average time on home page (min): Self-explanatory Website: Name of the specific Amazon service or domain Date, Time, Day: Parsed date and time information
📊 Potential Use Cases: Machine Learning: Predicting bounce rate, conversion likelihood, or segmenting user behavior. Business Intelligence: Dashboards for performance analysis by device, source, or day. Time Series Forecasting: Analyze traffic patterns over time. A/B Testing: Benchmarking traffic changes across page paths or traffic sources.
Facebook
TwitterUnique visitors, total sessions, and bounce rate for lacity.org, the main website for the City of Los Angeles.
Facebook
TwitterIn March 2024, X's web page Twitter.com had *** billion website visits worldwide, up from *** billion site visits the previous month. Formerly known as Twitter, X is a microblogging and social networking service that allows most of its users to write short posts with a maximum of 280 characters.
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Visitors technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union".
Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content?
To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic.
In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained.
To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market.
It includes:
Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
Facebook
TwitterA breakdown of the Food Hygiene Ratings website traffic data showing total number of visits, unique visitors and page views.
Facebook
TwitterData dictionary: Page_Title: Title of webpage used for pages of the website www.cityofrochester.gov Pageviews: Total number of pages viewed over the course of the calendar year listed in the year column. Repeated views of a single page are counted. Unique_Pageviews: Unique Pageviews - The number of sessions during which a specified page was viewed at least once. A unique pageview is counted for each URL and page title combination. Avg_Time: Average amount of time users spent looking at a specified page or screen. Entrances: The number of times visitors entered the website through a specified page.Bounce_Rate: " A bounce is a single-page session on your site. In Google Analytics, a bounce is calculated specifically as a session that triggers only a single request to the Google Analytics server, such as when a user opens a single page on your site and then exits without triggering any other requests to the Google Analytics server during that session. Bounce rate is single-page sessions on a page divided by all sessions that started with that page, or the percentage of all sessions on your site in which users viewed only a single page and triggered only a single request to the Google Analytics server. These single-page sessions have a session duration of 0 seconds since there are no subsequent hits after the first one that would let Google Analytics calculate the length of the session. "Exit_Rate: The number of exits from a page divided by the number of pageviews for the page. This is inclusive of sessions that started on different pages, as well as “bounce” sessions that start and end on the same page. For all pageviews to the page, Exit Rate is the percentage that were the last in the session. Year: Calendar year over which the data was collected. Data reflects the counts for each metric from January 1st through December 31st.
Facebook
TwitterIn March 2024, the video platform YouTube reported around 32.5 billion visits from global users. Meta-owned Facebook.com reported around 16.1 billion visits from global users, as Instagram.com and Twitter.com followed, each with 7 billion and 6.1 billion visits from users worldwide during the examined month. Wikipedia.org, which hosts users-generated encyclopedic entries, recorded around 4.4 billion visits, while news aggregator and community platform Reddit.com saw approximately 2.2 billion visits during the examined period.
Facebook
TwitterPer the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.