Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.
Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself.
Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation
From: Head of Data Science
Received: Today
Subject: New project from the product team
Hey!
I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.
I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!
They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.
You can find more details about what I expect you to do here. And information on the data here.
I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.
Good Luck!
From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?
Hi,
We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?
At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.
Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?
We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?
Look forward to seeing your presentation.
Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.
Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.
This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.
Tomato Soup
Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $
Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g
Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock
Method: 1. Cut the tomatoes into quarters….
The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.
As you will see, they haven't given us all of the information they have about each recipe.
You can find the data here.
I will let you decide how to process it, just make sure you include all your decisions in your report.
Don't forget to double check the data really does match what they say - it might not.
| Column Name | Details |
|---|---|
| recipe | Numeric, unique identifier of recipe |
| calories | Numeric, number of calories |
| carbohydrate | Numeric, amount of carbohydrates in grams |
| sugar | Numeric, amount of sugar in grams |
| protein | Numeric, amount of prote... |
Facebook
TwitterA dataset explaining organic traffic, its importance for SEO, and methods to track it in Google Analytics 4.
Facebook
TwitterIn the second quarter of 2025, mobile devices (excluding tablets) accounted for 62.54 percent of global website traffic. Since consistently maintaining a share of around 50 percent beginning in 2017, mobile usage surpassed this threshold in 2020 and has demonstrated steady growth in its dominance of global web access. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code:
Packet_Features_Generator.py & Features.py
To run this code:
pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j
-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j
Purpose:
Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.
Uses Features.py to calcualte the features.
startMachineLearning.sh & machineLearning.py
To run this code:
bash startMachineLearning.sh
This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags
Options (to be edited within this file):
--evaluate-only to test 5 fold cross validation accuracy
--test-scaling-normalization to test 6 different combinations of scalers and normalizers
Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use
--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'
Purpose:
Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.
Data
Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.
Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:
First number is a classification number to denote what website, query, or vr action is taking place.
The remaining numbers in each line denote:
The size of a packet,
and the direction it is traveling.
negative numbers denote incoming packets
positive numbers denote outgoing packets
Figure 4 Data
This data uses specific lines from the Virtual Reality.txt file.
The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.
The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.
The .xlsx and .csv file are identical
Each file includes (from right to left):
The origional packet data,
each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,
and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Facebook
TwitterReddit is a web traffic powerhouse: in March 2024 approximately 2.2 billion visits were measured to the online forum, making it one of the most-visited websites online. The front page of the internet Formerly known as “the front page of the internet”, Reddit is an online forum platform with over 130,000 sub-forums and communities. The platform allows registered users, called Redditors, to post content. Each post is open to the entire Reddit community to vote upon, either by down- or upvotes. The most popular posts are featured directly on the front page. Subreddits are available by category and Redditors can follow selected subreddits relevant to their interest and also control what content they see on their custom front page. Some of the most popular subreddits are r/AskReddit or r/AMA – the “Ask Me Anything” format. According to the company, Reddit hosted 1,800 AMAs in 2018, with a wide range of topics and hosts. One of the most popular Reddit AMA of 2022 by number of upvotes was by actor Nicolas Cagem with more than 238.5 thousand upvotes. Reddit usage The United States account for the biggest share of Reddit's desktop traffic, followed by the UK, and Canada. As of March 2023, Reddit ranked among the most popular social media websites in the United States.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Global network traffic analytics Industry Overview
Technavio’s analysts have identified the increasing use of network traffic analytics solutions to be one of major factors driving market growth. With the rapidly changing IT infrastructure, security hackers can steal valuable information through various modes. With the increasing dependence on web applications and websites for day-to-day activities and financial transactions, the instances of theft have increased globally. Also, the emergence of social networking websites has aided the malicious attackers to extract valuable information from vulnerable users. The increasing consumer dependence on web applications and websites for day-to-day activities and financial transactions are further increasing the risks of theft. This encourages the organizations to adopt network traffic analytics solutions.
Want a bigger picture? Try a FREE sample of this report now!
See the complete table of contents and list of exhibits, as well as selected illustrations and example pages from this report.
Companies covered
The network traffic analytics market is fairly concentrated due to the presence of few established companies offering innovative and differentiated software and services. By offering a complete analysis of the competitiveness of the players in the network monitoring tools market offering varied software and services, this network traffic analytics industry analysis report will aid clients identify new growth opportunities and design new growth strategies.
The report offers a complete analysis of a number of companies including:
Allot
Cisco Systems
IBM
Juniper Networks
Microsoft
Symantec
Network traffic analytics market growth based on geographic regions
Americas
APAC
EMEA
With a complete study of the growth opportunities for the companies across regions such as the Americas, APAC, and EMEA, our industry research analysts have estimated that countries in the Americas will contribute significantly to the growth of the network monitoring tools market throughout the predicted period.
Network traffic analytics market growth based on end-user
Telecom
BFSI
Healthcare
Media and entertainment
According to our market research experts, the telecom end-user industry will be the major end-user of the network monitoring tools market throughout the forecast period. Factors such as increasing use of network traffic analytics solutions and increasing use of mobile devices at workplaces will contribute to the growth of the market shares of the telecom industry in the network traffic analytics market.
Key highlights of the global network traffic analytics market for the forecast years 2018-2022:
CAGR of the market during the forecast period 2018-2022
Detailed information on factors that will accelerate the growth of the network traffic analytics market during the next five years
Precise estimation of the global network traffic analytics market size and its contribution to the parent market
Accurate predictions on upcoming trends and changes in consumer behavior
Growth of the network traffic analytics industry across various geographies such as the Americas, APAC, and EMEA
A thorough analysis of the market’s competitive landscape and detailed information on several vendors
Comprehensive information about factors that will challenge the growth of network traffic analytics companies
Get more value with Technavio’s INSIGHTS subscription platform! Gain easy access to all of Technavio’s reports, along with on-demand services. Try the demo
This market research report analyzes the market outlook and provides a list of key trends, drivers, and challenges that are anticipated to impact the global network traffic analytics market and its stakeholders over the forecast years.
The global network traffic analytics market analysts at Technavio have also considered how the performance of other related markets in the vertical will impact the size of this market till 2022. Some of the markets most likely to influence the growth of the network traffic analytics market over the coming years are the Global Network as a Service Market and the Global Data Analytics Outsourcing Market.
Technavio’s collection of market research reports offer insights into the growth of markets across various industries. Additionally, we also provide customized reports based on the specific requirement of our clients.
Facebook
Twitterhttps://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
see-tube.com is ranked #34029 in BR with 278.85K Traffic. Categories: . Learn more about website traffic, market share, and more!
Facebook
TwitterUnlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.
Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.
User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.
Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.
GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.
Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.
High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.
Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.
Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the aggregated version of the daily dataset used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-05, after aggregating them into weekly.
The original dataset contains missing values. They have been simply replaced by zeros before aggregation.
Facebook
TwitterThe dataset consists of multiple time series of Wikipedia page traffic, with each time series representing a different TV show. The dataset is used to evaluate the proposed framework for forecasting seasonal profiles in time series.
Facebook
Twitterhttps://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
pixel-see.com is ranked #569768 in IN with 16.79K Traffic. Categories: . Learn more about website traffic, market share, and more!
Facebook
TwitterOur Data Center Traffic web traffic dataset adds a critical layer of protection to your marketing stack by identifying and filtering web traffic generated from the IP addresses of suspicious data center sources. These signals often come from bots, scrapers, or emulators that disguise themselves as real users but deliver no value to your campaigns. Left unchecked, they can distort performance metrics, inflate engagement numbers, and drain your ad budget.
Leverage our web traffic data solutions for the following use cases: - Invalid Web Traffic Prevention - Data Hygiene & Model Building - Audience Quality Assurance - Trial & Partnership Transparency
With AdPreference, expect the following key benefits through our partnership: - Protect Your Ad Spend - Enhance Cybersecurity - Improve Campaign Performance - Strengthen Brand Integrity - Reduce Ad Fraud
By continuously monitoring and updating our web traffic intelligence, we empower marketers, agencies, and platforms to distinguish legitimate human activity from fraudulent traffic at scale. The result is cleaner datasets, more accurate audience models, and campaigns that perform against true user engagement. With our web traffic dataset, you can protect ad spend, maintain data integrity, and reinforce trust across your digital ecosystem.
For more information, please visit https://www.adpreference.co/
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Below you’ll find a month by month breakdown of traffic on the australia.gov.au website along the following lines: Pageviews Visits Pages per visit Average time on page Devices This data is generated …Show full descriptionBelow you’ll find a month by month breakdown of traffic on the australia.gov.au website along the following lines: Pageviews Visits Pages per visit Average time on page Devices This data is generated using Google analytics. Please Note: This is an initial version of the data only. We’re looking forward to hearing your feedback on what other metrics are of interest to you. Please let us know by sending an email to data@digital.gov.au.
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for al-monitor.com as of September 2025
Facebook
TwitterRetail platforms have undergone an unprecedented global traffic increase between January 2019 and June 2020, surpassing even holiday season traffic peaks. Overall, retail websites generated almost ** billion visits in June 2020, up from ***** billion global visits in January 2020. This is of course due to the global coronavirus pandemic which has forced millions of people to stay at home in order to stop the spread of the virus. Due to many shelter at home orders and a desire to avoid crowded stores in places where it is possible to shop, consumers have turned to the internet to procure everyday items such as groceries or toilet paper.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains network traffic logs captured by Burp-Suite, aimed at classifying web requests as either good or bad based on their characteristics. The dataset is designed for the task of predicting whether incoming requests are legitimate (good) or malicious (bad), aiding in the detection and prevention of web-based attacks.
badwords = ['sleep', 'uid', 'select', 'waitfor', 'delay', 'system', 'union', 'order by', 'group by', 'admin', 'drop', 'script']
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
You could find out the traffic of websites. Your task is to predict future daily traffic of each website based on historical data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.