73 datasets found

Daily website visitors (time series regression)
kaggle.com
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bob Nau
Description
Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
d
Website Analytics
catalog.data.gov
data.nola.gov
+4more
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
data.nola.gov
Description
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
E-commerce - Users of a French C2C fashion store
kaggle.com
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
Jeffrey Mvutu Mabilama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

Example works:

Report(s) made using SQL queries can be found on the data.world page of the dataset.

Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
g
Website Metrics
gimi9.com
datasets.ai
+1more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Website Metrics [Dataset]. https://gimi9.com/dataset/data-gov_website-metrics/
Explore at:
Dataset updated
Apr 1, 2025
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=de&inv=1&invt=Ab2fng (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=de
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
P
Alexa Domains Dataset
paperswithcode.com
opendatalab.com
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Corley; Jonathan Lwowski; Justin Hoffman (2001). Alexa Domains Dataset [Dataset]. https://paperswithcode.com/dataset/gagan-bhatia
Explore at:
Dataset updated
Feb 1, 2001
Authors
Isaac Corley; Jonathan Lwowski; Justin Hoffman
Description
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Monaco, Saint Vincent and the Grenadines, Latvia, Jordan, Belarus, Jamaica, Uzbekistan, Liechtenstein, Russian Federation, India
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
e
OGD Portal: Daily usage by record (since January 2024)
data.europa.eu
csv, excel xls, json +5
Updated Apr 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kanton-basel-landschaft (2025). OGD Portal: Daily usage by record (since January 2024) [Dataset]. https://data.europa.eu/data/datasets/12610-kanton-basel-landschaft?locale=en
Explore at:
n3, rdf xml, csv, json-ld, json, rdf turtle, parquet, excel xlsAvailable download formats
Dataset updated
Apr 6, 2025
Dataset authored and provided by
kanton-basel-landschaft
License
http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by
Description
The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.
u
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
d
GreenThumb Site Visits
catalog.data.gov
data.cityofnewyork.us
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). GreenThumb Site Visits [Dataset]. https://catalog.data.gov/dataset/greenthumb-site-visits
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description
Data Dictionary: https://docs.google.com/spreadsheets/d/1ItvGzNG8O_Yj97Tf6am4T-QyhnxP-BeIRjm7ZaUeAxs/edit#gid=1499621902 GreenThumb provides programming and material support to over 550 community gardens in New York City. NYC Parks GreenThumb staff visit all active community gardens under the jurisdiction of NYC Parks once each calendar year, subject to staff capacity. These site visits typically occur during the summer months and representatives of licensed garden groups are invited to attend. During these site visits, NYC Parks GreenThumb staff observe and record quantitative and qualitative information related to the physical status of the garden, as well as its ongoing operation, maintenance, and programming. This information is used by NYC Parks GreenThumb to inform maintenance needs at the garden and to help NYC Parks GreenThumb understand the needs of garden groups so that we can plan accordingly. In addition, this information is necessary for NYC Parks GreenThumb to confirm that publicly accessible community gardens under its jurisdiction are being operated in safe manner and in accordance with the NYC Parks GreenThumb License Agreement and applicable NYS and NYC laws and regulations. NYC Parks GreenThumb may conduct additional site visits as deemed necessary.
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Visitor analytics in city of Helsinki websites
kaggle.com
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olaf Laitinen (2024). Visitor analytics in city of Helsinki websites [Dataset]. http://doi.org/10.34740/kaggle/dsv/10342181
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10342181
Dataset updated
Dec 31, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Olaf Laitinen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki
Description
Administrator: Helsingin kaupunginkanslia / Digitalisaatioyksikkö

Administrator's webpage: https://www.hel.fi/fi

Published: 10.03.2022

Updated: 02.09.2022

Update frequency: day

Categories: Local government

Tags: visitor counts

Geographical coverage: Helsinki

Time series starts: 2022-01-01

Time series accuracy: month

License: Creative Commons Attribution 4.0

How to reference: Source: Visitor analytics in city of Helsinki websites. The maintainer of the dataset is Helsingin kaupunginkanslia / Digitalisaatioyksikkö. The dataset has been downloaded from Helsinki Region Infoshare service on 31.12.2024 under the license Creative Commons Attribution 4.0.
o
Universal Studios Guest Reviews Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Universal Studios Guest Reviews Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/1a47238b-eadf-4daa-98fb-22b378757600
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset provides a detailed collection of visitor reviews for Universal Studios branches, ideal for understanding customer sentiment and improving service delivery. It aims to assist organisations in managing a high volume of feedback by categorising reviews and determining overall sentiment from individual comments. This process enables companies to gain a clear insight into visitor feedback, which can lead to increased customer loyalty, business growth, enhanced brand value, and improved profitability. The dataset includes over 50,000 reviews collected from visitors across three Universal Studios locations (Florida, Singapore, and Japan), originally posted on the Trip Advisor website.

Columns

reviewer: The account name of the individual who submitted the review.

rating: The rating given by the reviewer, on a scale from 1 (indicating dissatisfaction) to 5 (indicating satisfaction).

written_date: The date when the review was posted.

title: The headline or title of the review.

review_text: The main body of the review content provided by the visitor.

branch: The specific Universal Studios location to which the review pertains.

Distribution

The dataset is typically provided in a CSV file format, structured as tabular data. It contains over 50,000 records. The ratings distribution shows a strong positive bias, with 28,202 reviews between 4.80 and 5.00, and 13,514 reviews between 4.00 and 4.20. Lower ratings include 1,973 reviews between 1.00 and 1.20, and 1,986 reviews between 2.00 and 2.20. The reviews span a wide period from 24 October 2002 to 30 May 2021. Geographically, Universal Studios Florida accounts for 60% of the reviews, Universal Studios Singapore for 31%, and other locations make up 9%.

Usage

This dataset is perfectly suited for developing and implementing a reviews management system. It can be used to determine overall sentiment from individual comments, helping businesses gain a clear understanding of visitor feedback. Applications include identifying specific areas for improvement, enhancing customer loyalty, boosting business performance, elevating brand value, and increasing profitability. It is also highly relevant for Natural Language Processing (NLP) and text analysis projects aimed at extracting insights from unstructured text data.

Coverage

The dataset covers visitor reviews for three major Universal Studios branches: Florida, Singapore, and Japan, reflecting a global scope. The time frame of the reviews extends from 24 October 2002 to 30 May 2021, offering a substantial historical perspective. The data originates from visitor postings on the Trip Advisor website, representing feedback from a diverse group of theme park guests.

License

CC0

Who Can Use It

This dataset is valuable for businesses in the hospitality and entertainment sectors, such as Universal Studios itself, for internal review management and strategic decision-making. Data analysts and scientists can use it for sentiment analysis, customer behaviour studies, and NLP model training. Marketing professionals can leverage the insights to understand customer preferences and refine branding strategies. Researchers focusing on tourism, consumer feedback, or text analytics will also find it a rich resource.

Dataset Name Suggestions

Universal Studios Guest Reviews

Trip Advisor Universal Studios Feedback

Global Theme Park Visitor Reviews

Universal Studios Customer Ratings

Entertainment Venue Reviews

Attributes

Original Data Source: Reviews of Universal Studios
m
Factori Audience | 1.2B unique mobile users in APAC, EU, North America and...
app.mobito.io
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Factori Audience | 1.2B unique mobile users in APAC, EU, North America and MENA [Dataset]. https://app.mobito.io/data-product/audience-data
Explore at:
Dataset updated
Dec 24, 2022
Area covered
ASIA, EUROPE, AFRICA, OCEANIA, SOUTH_AMERICA, NORTH_AMERICA
Description
We collect, validate, model, and segment raw data signals from over 900+ sources globally to deliver thousands of mobile audience segments. We then combine that data with other public and private data sources to derive interests, intent, and behavioral attributes. Our proprietary algorithms then clean, enrich, unify and aggregate these data sets for use in our products. We have categorized our audience data into consumable categories such as interest, demographics, behavior, geography, etc. Audience Data Categories:Below mentioned data categories include consumer behavioral data and consumer profiles (available for the US and Australia) divided into various data categories. Brand Shoppers:Methodology: This category has been created based on the high intent of users in terms of their visits to Brand outlets in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Place Category Visitors:Methodology: This category has been created based on the high intent of users visiting specific places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Demographics:This category has been created based on deterministic data that we receive from apps based on the declared gender and age data. Marital Status, Education, Party affiliation, and State residency are available in the US. Geo-Behavioural:This category has been created based on the high intent of users in terms of the frequency of their visits to specific granular places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Interests:This segment is created based on users' interest in a specific subject while browsing the internet when the visited website category is clearly focused on a specific subject such as cars, cooking, traveling, etc. We use a deterministic model to assign a proper profile and time that information is valid. The recency of data can range from 14 to 30 days, depending on the topic. Intent:Factori receives data from many partners to deliver high-quality pieces of information about users’ shopping intent. We collect data from sources connected to the eCommerce sector and we also receive data connected to online transactions from affiliate networks to deliver the most accurate segments with purchase intentions, such as laptops, mobile phones, or cars. The recency of data can range from 7 to 14 days depending on the product category. Events:This category was created based on the high interest of users in terms of content related to specific global events - sports, culture, and gaming. Among the event segments, we also distinguish categories related to the interest in certain lifestyle choices and behaviors. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. App Usage:Mobile category is a branch of the taxonomy that is dedicated only to the data that is based on mobile advertising IDs. It is based on the categorization of the mobile apps that the user has installed on the device. Auto Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of automobile and other automotive attributes via a survey or registration. These audiences are currently available in the USA. Motorcycle Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of motorcycle and other motorcycle-based attributes via a survey or registration. These audiences are currently available for the USA. Household:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on users' declaring their marital status, parental status, and the overall number of children via a survey or registration. These audiences are currently available in the USA. Financial:Consumer Profiles - Available for the US and Australia this audience has been created based on their behavior in different financial services like property ownership, mortgage, investing behavior, and wealth and declaring their estimated net worth via a survey or registration. Purchase/ Spending Behavior:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on their behavior in different spending behaviors in different business verticals available in the USA. Clusters:Consumer Profiles - Available for the US and AustraliaClusters are groups of consumers who exhibit similar demographic, lifestyle, and media consumption characteristics, empowering marketers to understand the unique attributes that comprise their most profitable consumer segments. Armed with this rich data, data scientists can drive analytics and modeling to power their brand’s unique marketing initiatives. B2B Audiences;Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring their employee credentials, designations, and companies they work in, further specifying business verticals, revenue breakdowns, and headquarters locations. Customizable Audiences Data Segment:Brands can choose the appropriate pre-made audience segments or ask our data experts about creating a custom segment that is precisely tailored to your brief in order to reach their target customers and boost the campaign's effectiveness. Location Query Granularity:Minimum area: HEX 8Maximum area: QuadKey 17/City
Board Public Website Usability Surveys
catalog.data.gov
datasets.ai
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve System (2024). Board Public Website Usability Surveys [Dataset]. https://catalog.data.gov/dataset/board-public-website-usability-surveys
Explore at:
Dataset updated
Dec 18, 2024
Dataset provided by
Federal Reserve Board of Governors
Federal Reserve Systemhttp://www.federalreserve.gov/
Description
The Board would use the FR 3076 to seek input from users or potential users of the Board's public website, social media, outreach, and communication responsibilities. The survey would be conducted with a diverse audience of consumers, banks, media, government, educators, and others to gather information about their visit to the Board's public website. Responses to the survey would be used to help improve the usability and offerings on the Board's public website and other online public communications. The frequency of the survey and content of the questions would vary as needs arise for feedback on different resources and from different audiences. The Board anticipates the FR 3076 may be conducted up to 12 times per year, although the survey may not be conducted that frequently. In addition, the Board anticipates conducting up to four focus group sessions per year.
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
t
Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian (2023). Dataset: Dataset: a large-scale study of cookie banner interaction tools and their impact on users' privacy / part2. https://doi.org/10.35097/1717 [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1717
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interrupt natural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interruptnatural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. TechnicalRemarks: This repository hosts the dataset corresponding to the paper "A Large-Scale Study of Cookie Banner Interaction Tools and their Impact on Users’ Privacy", which was published at the Privacy Enhancing Technologies Symposium (PETS) in 2024.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code

Daily website visitors (time series regression)

Predict tomorrow's number of website visitors from 5 years of daily data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 20, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Bob Nau

Description

Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

Clear search

Close search

Google apps

Main menu

Daily website visitors (time series regression)

Context

Content

Inspiration

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Website Analytics

E-commerce - Users of a French C2C fashion store

Foreword

Context

Content

Acknowledgements

Inspiration

License

Website Statistics

Website Metrics

Google Analytics Sample

Alexa Domains Dataset

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

OGD Portal: Daily usage by record (since January 2024)

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

GreenThumb Site Visits

Number of internet users worldwide 2014-2029

Visitor analytics in city of Helsinki websites

Universal Studios Guest Reviews Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Factori Audience | 1.2B unique mobile users in APAC, EU, North America and...

Board Public Website Usability Surveys

The Canada Trademarks Dataset

Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian...

Daily website visitors (time series regression)

Predict tomorrow's number of website visitors from 5 years of daily data

Context

Content

Inspiration