Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
Facebook
TwitterA dataset comparing features, pricing, and ratings of the top sites to buy website traffic in 2025: Google Ads, Facebook Ads, PropellerAds, and SparkTraffic.
Facebook
TwitterThe census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Facebook
TwitterGoogle.com, youtube.com, and facebook.com were the most visited websites in Ukraine in December 2021. Furthermore, Google's website on the Ukrainian domain, google.com.ua, ranked in the top 10 during that time.
Facebook
TwitterIn March 2024, the video platform YouTube reported around 32.5 billion visits from global users. Meta-owned Facebook.com reported around 16.1 billion visits from global users, as Instagram.com and Twitter.com followed, each with 7 billion and 6.1 billion visits from users worldwide during the examined month. Wikipedia.org, which hosts users-generated encyclopedic entries, recorded around 4.4 billion visits, while news aggregator and community platform Reddit.com saw approximately 2.2 billion visits during the examined period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traffic volumes data across Dublin City from the SCATS traffic management system. The Sydney Coordinated Adaptive Traffic System (SCATS) is an intelligent transportation system used to manage timing of signal phases at traffic signals. SCATS uses sensors at each traffic signal to detect vehicle presence in each lane and pedestrians waiting to cross at the local site. The vehicle sensors are generally inductive loops installed within the road. 3 resources are provided: SCATS Traffic Volumes Data (Monthly) Contained in this report are traffic counts taken from the SCATS traffic detectors located at junctions. The primary function for these traffic detectors is for traffic signal control. Such devices can also count general traffic volumes at defined locations on approach to a junction. These devices are set at specific locations on approaches to the junction but may not be on all approaches to a junction. As there are multiple junctions on any one route, it could be expected that a vehicle would be counted multiple times as it progress along the route. Thus the traffic volume counts here are best used to represent trends in vehicle movement by selecting a specific junction on the route which best represents the overall traffic flows. Information provided: End Time: time that one hour count period finishes. Region: location of the detector site (e.g. North City, West City, etc). Site: this can be matched with the SCATS Sites file to show location Detector: the detectors/ sensors at each site are numbered Sum volume: total traffic volumes in preceding hour Avg volume: average traffic volumes per 5 minute interval in preceding hour All Dates Traffic Volumes Data This file contains daily totals of traffic flow at each site location. SCATS Site Location Data Contained in this report, the location data for the SCATS sites is provided. The meta data provided includes the following; Site id – This is a unique identifier for each junction on SCATS Site description( CAP) – Descriptive location of the junction containing street name(s) intersecting streets Site description (lower) - – Descriptive location of the junction containing street name(s) intersecting streets Region – The area of the city, adjoining local authority, region that the site is located LAT/LONG – Coordinates Disclaimer: the location files are regularly updated to represent the locations of SCATS sites under the control of Dublin City Council. However site accuracy is not absolute. Information for LAT/LONG and region may not be available for all sites contained. It is at the discretion of the user to link the files for analysis and to create further data. Furthermore, detector communication issues or faulty detectors could also result in an inaccurate result for a given period, so values should not be taken as absolute but can be used to indicate trends.
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for my-best.com as of September 2025
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The people from Czech are publishing a dataset for the HTTPS traffic classification.
Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).
During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.
They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:
Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for similarweb.com as of October 2025
Facebook
TwitterAustralia's top fashion e-commerce websites saw a ** percent increase in site traffic in the first quarter of 2021 in comparison to the same quarter in 2020. Additionally, the top fashion e-commerce websites in France had an increase of ** percent of site traffic in Q1 of 2021 vs. Q1 2020.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.
Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself.
Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation
From: Head of Data Science
Received: Today
Subject: New project from the product team
Hey!
I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.
I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!
They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.
You can find more details about what I expect you to do here. And information on the data here.
I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.
Good Luck!
From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?
Hi,
We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?
At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.
Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?
We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?
Look forward to seeing your presentation.
Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.
Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.
This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.
Tomato Soup
Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $
Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g
Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock
Method: 1. Cut the tomatoes into quarters….
The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.
As you will see, they haven't given us all of the information they have about each recipe.
You can find the data here.
I will let you decide how to process it, just make sure you include all your decisions in your report.
Don't forget to double check the data really does match what they say - it might not.
| Column Name | Details |
|---|---|
| recipe | Numeric, unique identifier of recipe |
| calories | Numeric, number of calories |
| carbohydrate | Numeric, amount of carbohydrates in grams |
| sugar | Numeric, amount of sugar in grams |
| protein | Numeric, amount of prote... |
Facebook
TwitterTraffic volumes data across Dublin City from the SCATS traffic management system. The Sydney Coordinated Adaptive Traffic System (SCATS) is an intelligent transportation system used to manage timing of signal phases at traffic signals. SCATS uses sensors at each traffic signal to detect vehicle presence in each lane and pedestrians waiting to cross at the local site. The vehicle sensors are generally inductive loops installed within the road.
3 resources are provided:
SCATS Traffic Volumes Data (Monthly) Contained in this report are traffic counts taken from the SCATS traffic detectors located at junctions. The primary function for these traffic detectors is for traffic signal control. Such devices can also count general traffic volumes at defined locations on approach to a junction. These devices are set at specific locations on approaches to the junction but may not be on all approaches to a junction. As there are multiple junctions on any one route, it could be expected that a vehicle would be counted multiple times as it progress along the route. Thus the traffic volume counts here are best used to represent trends in vehicle movement by selecting a specific junction on the route which best represents the overall traffic flows.
Information provided:
End Time: time that one hour count period finishes.
Region: location of the detector site (e.g. North City, West City, etc).
Site: this can be matched with the SCATS Sites file to show location
Detector: the detectors/ sensors at each site are numbered
Sum volume: total traffic volumes in preceding hour
Avg volume: average traffic volumes per 5 minute interval in preceding hour
All Dates Traffic Volumes Data
This file contains daily totals of traffic flow at each site location.
SCATS Site Location Data Contained in this report, the location data for the SCATS sites is provided. The meta data provided includes the following;
Site id – This is a unique identifier for each junction on SCATS
Site description( CAP) – Descriptive location of the junction containing street name(s) intersecting streets
Site description (lower) - – Descriptive location of the junction containing street name(s) intersecting streets
Region – The area of the city, adjoining local authority, region that the site is located
LAT/LONG – Coordinates
Disclaimer: the location files are regularly updated to represent the locations of SCATS sites under the control of Dublin City Council. However site accuracy is not absolute. Information for LAT/LONG and region may not be available for all sites contained. It is at the discretion of the user to link the files for analysis and to create further data. Furthermore, detector communication issues or faulty detectors could also result in an inaccurate result for a given period, so values should not be taken as absolute but can be used to indicate trends.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Traffic Volumes data across Dublin City from the scats traffic management system. The Sydney Coordinated Adaptive Traffic System (scats) is an intelligent transportation system used to manage timing of signal Phases at traffic signals. Scats uses SENSORS at each traffic signal to detect vehicle presence in each lane and pedestrians waiting to cross at the local site. The vehicle SENSORS are Generally Inductive Loops installed within the road. For scats junctions locations see: https://data.smartdublin.ie/dataset/traffic-signals-and-scats-sites-locations-dcc These are large data files. There would be too many rows for downloading with certain programmes such as Excel. Please choose a software package which can manage such large data files. 3 resources are provided: Scats Traffic Volumes Data (Monthly) Contained in this report are traffic Counts taken from the scats traffic detectors located at junctions. The primary function for these traffic detectors is for traffic signal control. Such devices can also count general traffic Volumes at defined locations on approach to a junction. These devices are set at specific locations on approaches to the junction but may not be on all approaches to a junction. As there are multiple junctions on any one route, it could be expected that a vehicle would be counted multiple times as it progress along the route. Set the traffic volume Counts here are best used to Represent trends in vehicle movement by selecting a specific junction on the route which best represents the overall traffic flows. Information provided: End Time: time that one hour count period finishes. Region: location of the detector site (e.g. North City, West City, etc.). Site: this can be matched with the scats Sites file to show location Detector: the detectors/SENSORS at each site are numbered Sum volume: total traffic Volumes in preceding hour AVG volume: average traffic Volumes per 5 minute interval in preceding hour All Dates Traffic Volumes Data This file contains daily totals of traffic flow at each site location. Scats Site Location Data Contained in this report, the location data for the scats sites is provided. The meta data provided includes the following; Site id — This is a unique identifier for each junction on scats Site description(CAP) — Descriptive location of the junction containing street name(s) intersecting street streets Site description (lower) — – Descriptive location of the junction containing street name(s) intersecting street streets Region — The area of the city, adjoining local authority, region that the site is located Lat/LONG — Coordinates Disclaimer: the location files are regularly updated to Represent the locations of scats sites under the control of Dublin City Council. However site accuracy is not absolute. Information for LAT/LONG and region may not be available for all sites contained. It is at the discretion of the user to link the files for analysis and to create further data. Furthermore, detector communication issues or Faulty detectors could also result in an inaccurate result for a given period, so values should not be taken as absolute but can be used to indicate trends.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are publishing a dataset we created for the HTTPS traffic classification.
Since the data were captured mainly in the real backbone network, we omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).
During our research, we divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.
We have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. We also used several popular websites that primarily focus on the audience in our country. The identified traffic classes and their representatives are provided below:
Live Video Stream Twitch, Czech TV, YouTube Live
Video Player DailyMotion, Stream.cz, Vimeo, YouTube
Music Player AppleMusic, Spotify, SoundCloud
File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive
Website and Other Traffic Websites from Alexa Top 1M list
Facebook
Twitterhttps://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
cdntraffic.top is ranked #8920 in RU with 614.31K Traffic. Categories: . Learn more about website traffic, market share, and more!
Facebook
Twitterhttps://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
amex-travel.top is ranked #5941 in ES with 205.88K Traffic. Categories: . Learn more about website traffic, market share, and more!
Facebook
Twitterhttps://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
thecams.top is ranked #3788 in PE with 359.17K Traffic. Categories: . Learn more about website traffic, market share, and more!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3 resources are provided: SCATS Traffic Volumes Data (Monthly) Contained in this report are traffic counts taken from the SCATS traffic detectors located at junctions. The primary function for these traffic detectors is for traffic signal control. Such devices can also count general traffic volumes at defined locations on approach to a junction. These devices are set at specific locations on approaches to the junction but may not be on all approaches to a junction. As there are multiple junctions on any one route, it could be expected that a vehicle would be counted multiple times as it progress along the route. Thus the traffic volume counts here are best used to represent trends in vehicle movement by selecting a specific junction on the route which best represents the overall traffic flows. Information provided: End Time: time that one hour count period finishes. Region: location of the detector site (e.g. North City, West City, etc). Site: this can be matched with the SCATS Sites file to show location Detector: the detectors/ sensors at each site are numbered Sum volume: total traffic volumes in preceding hour Avg volume: average traffic volumes per 5 minute interval in preceding hour All Dates Traffic Volumes Data This file contains daily totals of traffic flow at each site location. SCATS Site Location Data Contained in this report, the location data for the SCATS sites is provided. The meta data provided includes the following; Site id – This is a unique identifier for each junction on SCATS Site description( CAP) – Descriptive location of the junction containing street name(s) intersecting streets Site description (lower) - – Descriptive location of the junction containing street name(s) intersecting streets Region – The area of the city, adjoining local authority, region that the site is located LAT/LONG – Coordinates Disclaimer: the location files are regularly updated to represent the locations of SCATS sites under the control of Dublin City Council. However site accuracy is not absolute. Information for LAT/LONG and region may not be available for all sites contained. It is at the discretion of the user to link the files for analysis and to create further data. Furthermore, detector communication issues or faulty detectors could also result in an inaccurate result for a given period, so values should not be taken as absolute but can be used to indicate trends. .hidden { display: none }
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for best-hashtags.com as of October 2025
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.