73 datasets found
  1. Daily website visitors (time series regression)

    • kaggle.com
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bob Nau
    Description

    Context

    This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

    Content

    The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

    Inspiration

    This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

  2. A

    ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

    Methodology

    The data collected originates from SimilarWeb.com.

    Source

    For the analysis and study, go to The Concept Center

    This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

    How to use this dataset

    • Analyze 11/1/2016 in relation to 2/1/2017
    • Study the influence of 4/1/2017 on 1/1/2017
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  3. d

    Website Analytics

    • catalog.data.gov
    • data.nola.gov
    • +4more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    data.nola.gov
    Description

    This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.

  4. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  5. Website Statistics

    • data.wu.ac.at
    • data.europa.eu
    csv, pdf
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jun 11, 2018
    Dataset provided by
    Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

    • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

    • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

    • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

    • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

      Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

    These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

  6. g

    Website Metrics

    • gimi9.com
    • datasets.ai
    • +1more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Website Metrics [Dataset]. https://gimi9.com/dataset/data-gov_website-metrics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    Description

    Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.

  7. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=de&inv=1&invt=Ab2fng (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=de
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  8. P

    Alexa Domains Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Corley; Jonathan Lwowski; Justin Hoffman (2001). Alexa Domains Dataset [Dataset]. https://paperswithcode.com/dataset/gagan-bhatia
    Explore at:
    Dataset updated
    Feb 1, 2001
    Authors
    Isaac Corley; Jonathan Lwowski; Justin Hoffman
    Description

    This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest

  9. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  10. d

    Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

    • datarade.ai
    .csv, .xls
    Updated Jun 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset authored and provided by
    Swash
    Area covered
    Monaco, Saint Vincent and the Grenadines, Latvia, Jordan, Belarus, Jamaica, Uzbekistan, Liechtenstein, Russian Federation, India
    Description

    Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

    Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

    User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

    Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

    GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

    Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

    High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

    Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

    Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.

  11. e

    OGD Portal: Daily usage by record (since January 2024)

    • data.europa.eu
    csv, excel xls, json +5
    Updated Apr 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kanton-basel-landschaft (2025). OGD Portal: Daily usage by record (since January 2024) [Dataset]. https://data.europa.eu/data/datasets/12610-kanton-basel-landschaft?locale=en
    Explore at:
    n3, rdf xml, csv, json-ld, json, rdf turtle, parquet, excel xlsAvailable download formats
    Dataset updated
    Apr 6, 2025
    Dataset authored and provided by
    kanton-basel-landschaft
    License

    http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by

    Description

    The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.

  12. u

    Data from: Google Analytics & Twitter dataset from a movies, TV series and...

    • portalcientificovalencia.univeuropea.com
    • figshare.com
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
    Explore at:
    Dataset updated
    2024
    Authors
    Yeste, Víctor; Yeste, Víctor
    Description

    Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

  13. d

    GreenThumb Site Visits

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). GreenThumb Site Visits [Dataset]. https://catalog.data.gov/dataset/greenthumb-site-visits
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Data Dictionary: https://docs.google.com/spreadsheets/d/1ItvGzNG8O_Yj97Tf6am4T-QyhnxP-BeIRjm7ZaUeAxs/edit#gid=1499621902 GreenThumb provides programming and material support to over 550 community gardens in New York City. NYC Parks GreenThumb staff visit all active community gardens under the jurisdiction of NYC Parks once each calendar year, subject to staff capacity. These site visits typically occur during the summer months and representatives of licensed garden groups are invited to attend. During these site visits, NYC Parks GreenThumb staff observe and record quantitative and qualitative information related to the physical status of the garden, as well as its ongoing operation, maintenance, and programming. This information is used by NYC Parks GreenThumb to inform maintenance needs at the garden and to help NYC Parks GreenThumb understand the needs of garden groups so that we can plan accordingly. In addition, this information is necessary for NYC Parks GreenThumb to confirm that publicly accessible community gardens under its jurisdiction are being operated in safe manner and in accordance with the NYC Parks GreenThumb License Agreement and applicable NYS and NYC laws and regulations. NYC Parks GreenThumb may conduct additional site visits as deemed necessary.

  14. Number of internet users worldwide 2014-2029

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    World
    Description

    The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

  15. Visitor analytics in city of Helsinki websites

    • kaggle.com
    Updated Dec 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olaf Laitinen (2024). Visitor analytics in city of Helsinki websites [Dataset]. http://doi.org/10.34740/kaggle/dsv/10342181
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Olaf Laitinen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki
    Description
    • Administrator: Helsingin kaupunginkanslia / Digitalisaatioyksikkö
    • Administrator's webpage: https://www.hel.fi/fi
    • Published: 10.03.2022
    • Updated: 02.09.2022
    • Update frequency: day
    • Categories: Local government
    • Tags: visitor counts
    • Geographical coverage: Helsinki
    • Time series starts: 2022-01-01
    • Time series accuracy: month
    • License: Creative Commons Attribution 4.0
    • How to reference: Source: Visitor analytics in city of Helsinki websites. The maintainer of the dataset is Helsingin kaupunginkanslia / Digitalisaatioyksikkö. The dataset has been downloaded from Helsinki Region Infoshare service on 31.12.2024 under the license Creative Commons Attribution 4.0.
  16. o

    Universal Studios Guest Reviews Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Universal Studios Guest Reviews Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/1a47238b-eadf-4daa-98fb-22b378757600
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset provides a detailed collection of visitor reviews for Universal Studios branches, ideal for understanding customer sentiment and improving service delivery. It aims to assist organisations in managing a high volume of feedback by categorising reviews and determining overall sentiment from individual comments. This process enables companies to gain a clear insight into visitor feedback, which can lead to increased customer loyalty, business growth, enhanced brand value, and improved profitability. The dataset includes over 50,000 reviews collected from visitors across three Universal Studios locations (Florida, Singapore, and Japan), originally posted on the Trip Advisor website.

    Columns

    • reviewer: The account name of the individual who submitted the review.
    • rating: The rating given by the reviewer, on a scale from 1 (indicating dissatisfaction) to 5 (indicating satisfaction).
    • written_date: The date when the review was posted.
    • title: The headline or title of the review.
    • review_text: The main body of the review content provided by the visitor.
    • branch: The specific Universal Studios location to which the review pertains.

    Distribution

    The dataset is typically provided in a CSV file format, structured as tabular data. It contains over 50,000 records. The ratings distribution shows a strong positive bias, with 28,202 reviews between 4.80 and 5.00, and 13,514 reviews between 4.00 and 4.20. Lower ratings include 1,973 reviews between 1.00 and 1.20, and 1,986 reviews between 2.00 and 2.20. The reviews span a wide period from 24 October 2002 to 30 May 2021. Geographically, Universal Studios Florida accounts for 60% of the reviews, Universal Studios Singapore for 31%, and other locations make up 9%.

    Usage

    This dataset is perfectly suited for developing and implementing a reviews management system. It can be used to determine overall sentiment from individual comments, helping businesses gain a clear understanding of visitor feedback. Applications include identifying specific areas for improvement, enhancing customer loyalty, boosting business performance, elevating brand value, and increasing profitability. It is also highly relevant for Natural Language Processing (NLP) and text analysis projects aimed at extracting insights from unstructured text data.

    Coverage

    The dataset covers visitor reviews for three major Universal Studios branches: Florida, Singapore, and Japan, reflecting a global scope. The time frame of the reviews extends from 24 October 2002 to 30 May 2021, offering a substantial historical perspective. The data originates from visitor postings on the Trip Advisor website, representing feedback from a diverse group of theme park guests.

    License

    CC0

    Who Can Use It

    This dataset is valuable for businesses in the hospitality and entertainment sectors, such as Universal Studios itself, for internal review management and strategic decision-making. Data analysts and scientists can use it for sentiment analysis, customer behaviour studies, and NLP model training. Marketing professionals can leverage the insights to understand customer preferences and refine branding strategies. Researchers focusing on tourism, consumer feedback, or text analytics will also find it a rich resource.

    Dataset Name Suggestions

    • Universal Studios Guest Reviews
    • Trip Advisor Universal Studios Feedback
    • Global Theme Park Visitor Reviews
    • Universal Studios Customer Ratings
    • Entertainment Venue Reviews

    Attributes

    Original Data Source: Reviews of Universal Studios

  17. m

    Factori Audience | 1.2B unique mobile users in APAC, EU, North America and...

    • app.mobito.io
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Factori Audience | 1.2B unique mobile users in APAC, EU, North America and MENA [Dataset]. https://app.mobito.io/data-product/audience-data
    Explore at:
    Dataset updated
    Dec 24, 2022
    Area covered
    ASIA, EUROPE, AFRICA, OCEANIA, SOUTH_AMERICA, NORTH_AMERICA
    Description

    We collect, validate, model, and segment raw data signals from over 900+ sources globally to deliver thousands of mobile audience segments. We then combine that data with other public and private data sources to derive interests, intent, and behavioral attributes. Our proprietary algorithms then clean, enrich, unify and aggregate these data sets for use in our products. We have categorized our audience data into consumable categories such as interest, demographics, behavior, geography, etc. Audience Data Categories:Below mentioned data categories include consumer behavioral data and consumer profiles (available for the US and Australia) divided into various data categories. Brand Shoppers:Methodology: This category has been created based on the high intent of users in terms of their visits to Brand outlets in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Place Category Visitors:Methodology: This category has been created based on the high intent of users visiting specific places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Demographics:This category has been created based on deterministic data that we receive from apps based on the declared gender and age data. Marital Status, Education, Party affiliation, and State residency are available in the US. Geo-Behavioural:This category has been created based on the high intent of users in terms of the frequency of their visits to specific granular places of interest in the real world. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. Interests:This segment is created based on users' interest in a specific subject while browsing the internet when the visited website category is clearly focused on a specific subject such as cars, cooking, traveling, etc. We use a deterministic model to assign a proper profile and time that information is valid. The recency of data can range from 14 to 30 days, depending on the topic. Intent:Factori receives data from many partners to deliver high-quality pieces of information about users’ shopping intent. We collect data from sources connected to the eCommerce sector and we also receive data connected to online transactions from affiliate networks to deliver the most accurate segments with purchase intentions, such as laptops, mobile phones, or cars. The recency of data can range from 7 to 14 days depending on the product category. Events:This category was created based on the high interest of users in terms of content related to specific global events - sports, culture, and gaming. Among the event segments, we also distinguish categories related to the interest in certain lifestyle choices and behaviors. To create segments containing users with a high-affinity index, we use a precise determination of the number of occurrences at a given time. App Usage:Mobile category is a branch of the taxonomy that is dedicated only to the data that is based on mobile advertising IDs. It is based on the categorization of the mobile apps that the user has installed on the device. Auto Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of automobile and other automotive attributes via a survey or registration. These audiences are currently available in the USA. Motorcycle Ownership:Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring that they own a certain brand of motorcycle and other motorcycle-based attributes via a survey or registration. These audiences are currently available for the USA. Household:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on users' declaring their marital status, parental status, and the overall number of children via a survey or registration. These audiences are currently available in the USA. Financial:Consumer Profiles - Available for the US and Australia this audience has been created based on their behavior in different financial services like property ownership, mortgage, investing behavior, and wealth and declaring their estimated net worth via a survey or registration. Purchase/ Spending Behavior:Consumer Profiles - Available for the US and AustraliaThis audience has been created based on their behavior in different spending behaviors in different business verticals available in the USA. Clusters:Consumer Profiles - Available for the US and AustraliaClusters are groups of consumers who exhibit similar demographic, lifestyle, and media consumption characteristics, empowering marketers to understand the unique attributes that comprise their most profitable consumer segments. Armed with this rich data, data scientists can drive analytics and modeling to power their brand’s unique marketing initiatives. B2B Audiences;Consumer Profiles - Available for US and AustraliaThis audience has been created based on users declaring their employee credentials, designations, and companies they work in, further specifying business verticals, revenue breakdowns, and headquarters locations. Customizable Audiences Data Segment:Brands can choose the appropriate pre-made audience segments or ask our data experts about creating a custom segment that is precisely tailored to your brief in order to reach their target customers and boost the campaign's effectiveness. Location Query Granularity:Minimum area: HEX 8Maximum area: QuadKey 17/City

  18. Board Public Website Usability Surveys

    • catalog.data.gov
    • datasets.ai
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Board Public Website Usability Surveys [Dataset]. https://catalog.data.gov/dataset/board-public-website-usability-surveys
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    The Board would use the FR 3076 to seek input from users or potential users of the Board's public website, social media, outreach, and communication responsibilities. The survey would be conducted with a diverse audience of consumers, banks, media, government, educators, and others to gather information about their visit to the Board's public website. Responses to the survey would be used to help improve the usability and offerings on the Board's public website and other online public communications. The frequency of the survey and content of the questions would vary as needs arise for feedback on different resources and from different audiences. The Board anticipates the FR 3076 may be conducted up to 12 times per year, although the survey may not be conducted that frequently. In addition, the Board anticipates conducting up to four focus group sessions per year.

  19. The Canada Trademarks Dataset

    • zenodo.org
    pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeremy Sheff; Jeremy Sheff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    The Canada Trademarks Dataset

    18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

    Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

    Python and Stata Scripts (c) 2021 Jeremy Sheff

    Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

    This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

    Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

    Terms of Use:

    As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

    The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

    The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

    Details of Repository Contents:

    This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

    • /csv: contains the .csv versions of the data files
    • /do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset
    • /dta: contains the .dta versions of the data files
    • /py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

    If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

    The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

    With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

    The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

    This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.

  20. t

    Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian (2023). Dataset: Dataset: a large-scale study of cookie banner interaction tools and their impact on users' privacy / part2. https://doi.org/10.35097/1717 [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1717
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interrupt natural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interruptnatural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. TechnicalRemarks: This repository hosts the dataset corresponding to the paper "A Large-Scale Study of Cookie Banner Interaction Tools and their Impact on Users’ Privacy", which was published at the Privacy Enhancing Technologies Symposium (PETS) in 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
Organization logo

Daily website visitors (time series regression)

Predict tomorrow's number of website visitors from 5 years of daily data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bob Nau
Description

Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

Search
Clear search
Close search
Google apps
Main menu