41 datasets found

d
Open Data Website Traffic
catalog.data.gov
data.lacity.org
+1more
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.lacity.org
Description
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Daily website visitors (time series regression)
kaggle.com
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bob Nau
Description
Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
d
Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...
datarade.ai
.csv
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Consumer Edge (2025). Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B Shopper Insights | 59 Countries, 3-Day Lag, Daily Delivery [Dataset]. https://datarade.ai/data-products/click-global-data-web-traffic-data-transaction-data-con-consumer-edge
Explore at:
.csvAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Consumer Edge
Area covered
Marshall Islands, Congo, Bermuda, El Salvador, Nauru, Bosnia and Herzegovina, Finland, South Africa, Sri Lanka, Montserrat
Description
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights

Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.

Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.

Use Case: Analyze Year Over Year Growth Rate by Region

Problem A public investor wants to understand how a company’s year-over-year growth differs by region.

Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends

Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume

Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels

Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.

Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=pl&inv=1&invt=Ab3yJQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=pl
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
D
Monthly Page Views to CDC.gov
data.cdc.gov
data.virginia.gov
+3more
application/rdfxml +5
Updated Aug 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Associate Director for Communication, Division of News and Electronic Media (2025). Monthly Page Views to CDC.gov [Dataset]. https://data.cdc.gov/Web-Metrics/Monthly-Page-Views-to-CDC-gov/rq85-buyi
Explore at:
xml, application/rdfxml, json, csv, application/rssxml, tsvAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Office of the Associate Director for Communication, Division of News and Electronic Media
Description
For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
figshare.com
portalcientificovalencia.univeuropea.com
txt
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16553061.v4
Dataset updated
Feb 7, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Víctor Yeste
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
d
Swash User Search and Consumer Journey Data - 1.5M Worldwide Users - GDPR...
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash User Search and Consumer Journey Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/users-searching-data-on-top-search-engines
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Taiwan, Panama, Honduras, Macao, Israel, United States of America, Bangladesh, Korea (Republic of), Japan, Kuwait
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviour: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Website Metrics
catalog.data.gov
datasets.ai
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEMA/Office of External Affairs/Communication Division (2025). Website Metrics [Dataset]. https://catalog.data.gov/dataset/website-metrics
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
Federal Emergency Management Agencyhttp://www.fema.gov/
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
E-commerce - Users of a French C2C fashion store
kaggle.com
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
Jeffrey Mvutu Mabilama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

Example works:

Report(s) made using SQL queries can be found on the data.world page of the dataset.

Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.
n
FOI-01782 - Datasets - Open Data Portal
opendata.nhsbsa.net
Updated Mar 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). FOI-01782 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01782
Explore at:
Dataset updated
Mar 21, 2024
Description
Thank you for explaining that you don’t collect data on the number of abandoned applications. Alternatively, please could you share the website analytics which shows the number of visitors to each webpage, from this information we can compare against form completion rates and if there is a particular drop in traffic on certain pages/questions? Response A copy of the information is attached. Please read the below notes to ensure correct understanding of the data. Attached is raw data covering individual page hits from 19 February 2024 to 17 March 2024. Please be advised that our Data Analysts have viewed the Google analytics for the Healthy Start website pages, and despite the search options including country, regions and town or city, the data provided within these fields is an approximation and cannot be guaranteed as a true location of a user. We believe that Google analytics geo location capabilities are based on IP (Internet Protocol) addresses which may not resolve to a true location, and instead could be based off the users ISP (Internet Service Provider) server location. Therefore, please be aware that this raw data is not reliable.
D
Exhibit of Datasets
ssh.datastations.nl
pdf
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P.K. Doorn; L. Breure; P.K. Doorn; L. Breure (2024). Exhibit of Datasets [Dataset]. http://doi.org/10.17026/SS/TLTMIR
Explore at:
pdf(6387646), pdf(2009614), pdf(21694737), pdf(7119932), pdf(7368953), pdf(2266022), pdf(5957611), pdf(2372244), pdf(3506939), pdf(7233056), pdf(3825954), pdf(1165676), pdf(2683520), pdf(602628), pdf(1968819), pdf(12429754), pdf(1802813), pdf(8847011), pdf(8196391), pdf(559663), pdf(4024461), pdf(1992824), pdf(1541567), pdf(2404227)Available download formats
Unique identifier
https://doi.org/10.17026/SS/TLTMIR
Dataset updated
Sep 2, 2024
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
P.K. Doorn; L. Breure; P.K. Doorn; L. Breure
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2016 - 2020
Dataset funded by
Data Archiving and Networked Services
Description
The Exhibit of Datasets was an experimental project with the aim of providing concise introductions to research datasets in the humanities and social sciences deposited in a trusted repository and thus made accessible for the long term. The Exhibit consists of so-called 'showcases', short webpages summarizing and supplementing the corresponding data papers, published in the Research Data Journal for the Humanities and Social Sciences. The showcase is a quick introduction to such a dataset, a bit longer than an abstract, with illustrations, interactive graphs and other multimedia (if available). As a rule it also offers the option to get acquainted with the data itself, through an interactive online spreadsheet, a data sample or link to the online database of a research project. Usually, access to these datasets requires several time consuming actions, such as downloading data, installing the appropriate software and correctly uploading the data into these programs. This makes it difficult for interested parties to quickly assess the possibilities for reuse in other projects. The Exhibit aimed to help visitors of the website to get the right information at a glance by: - Attracting attention to (recently) acquired deposits: showing why data are interesting. - Providing a concise overview of the dataset's scope and research background; more details are to be found, for example, in the associated data paper in the Research Data Journal (RDJ). - Bringing together references to the location of the dataset and to more detailed information elsewhere, such as the project website of the data producers. - Allowing visitors to explore (a sample of) the data without downloading and installing associated software at first (see below). - Publishing related multimedia content, such as videos, animated maps, slideshows etc., which are currently difficult to include in online journals as RDJ. - Making it easier to review the dataset. The Exhibit would also have been the right place to publish these reviews in the same way as a webshop publishes consumer reviews of a product, but this could not yet be achieved within the limited duration of the project. Note (1) The text of the showcase is a summary of the corresponding data paper in RDJ, and as such a compilation made by the Exhibit editor. In some cases a section 'Quick start in Reusing Data' is added, whose text is written entirely by the editor. (2) Various hyperlinks such as those to pages within the Exhibit website will no longer work. The interactive Zoho spreadsheets are also no longer available because this facility has been discontinued.
Context Ad Clicks Dataset
kaggle.com
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2021). Context Ad Clicks Dataset [Dataset]. https://www.kaggle.com/arashnic/ctrtest/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The dataset generated by an E-commerce website which sells a variety of products at its online platform. The records user behaviour of its customers and stores it as a log. However, most of the times, users do not buy the products instantly and there is a time gap during which the customer might surf the internet and maybe visit competitor websites. Now, to improve sales of products, website owner has hired an Adtech company which built a system such that ads are being shown for owner products on its partner websites. If a user comes to owner website and searches for a product, and then visits these partner websites or apps, his/her previously viewed items or their similar items are shown on as an ad. If the user clicks this ad, he/she will be redirected to the owner website and might buy the product.

The task is to predict the probability i.e. probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

Content

You are provided with the view log of users (2018/10/15 - 2018/12/11) and the product description collected from the owner website. We also provide the training data and test data containing details for ad impressions at the partner websites(Train + Test). Train data contains the impression logs during 2018/11/15 – 2018/12/13 along with the label which specifies whether the ad is clicked or not. Your model will be evaluated on the test data which have impression logs during 2018/12/12 – 2018/12/18 without the labels. You are provided with the following files:

train.zip: This contains 3 files and description of each is given below:

train.csv

view_log.csv

item_data.csv

test.csv: test file contains the impressions for which the participants need to predict the click rate sample_submission.csv: This file contains the format in which you have to submit your predictions.

Inspiration

Predict the probability probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

The evaluated metric could be "area under the ROC curve" between the predicted probability and the observed target.
w
Calderdale Data Works Webstats
data.wu.ac.at
csv, html, pdf, xls
Updated Aug 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calderdale Council (2018). Calderdale Data Works Webstats [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/OThiNDdlZjEtZDAwOS00NDg5LTk3ZTEtYTUzODhiNWZiMDQ4
Explore at:
csv, pdf(102759.0), xls, html, pdfAvailable download formats
Dataset updated
Aug 24, 2018
Dataset provided by
Calderdale Council
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Calderdale
Description
Views, visits and downloads of the datasets published on our Data Works platform. Please note, each file uploaded on the site is listed and this may include files now deleted or changed.

This dataset has information from data.gov.uk to show the number of downloads for each file and from Google Analytics to show the number of visits, unique visits, time spent, bounce rate and exit rate for each dataset.

Our Open Data is also published on data.gov.uk and you can see statistics for this site here - Data.Gov.UK - Calderdale
A web tracking data set of online browsing behavior of 2,148 users
zenodo.org
data.niaid.nih.gov
application/gzip, txt +1
Updated May 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2021). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
Explore at:
zip, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4757574
Dataset updated
May 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

The data set is analyzed in the following paper:

Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

If you use data or code from this repository, please cite the paper above and the Zenodo link.
E
Meteorite strikes
dtechtive.com
xml, zip
Updated Feb 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2017). Meteorite strikes [Dataset]. http://doi.org/10.7488/ds/1915
Explore at:
zip(1.519 MB), xml(0.0043 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1915
Dataset updated
Feb 22, 2017
Dataset provided by
University of Edinburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset shows the recorded meteorite strikes on Earth. Things to note are that the dataset only shows the strikes that we know about through either first-hand observations, historical accounts and from physical evidence. There is a clear bias for densly populate, developed countries where research is generally focused (Europe/America), but you may also note the high numbers of impacts in desert areas where evidence is generally undisturbed (Australia, Gulf States). There is an obvious sparsity of data over the oceans where only the largest impacts will leave any trace. The dataset was sourced from the Guardian Website (http://www.guardian.co.uk/news/datablog/interactive/2013/feb/15/meteorite-fall-map) but originates from the Meteoritical Society (http://www.lpi.usra.edu/meteor/metbull.php). Please attribute the Meteoritical Society as the source when re-using this data. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2013-02-20 and migrated to Edinburgh DataShare on 2017-02-22.
Wikipedia Web Traffic 2018-19
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kaggle
Authors
san_bt
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

Content

Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

Acknowledgements

Credits :
1. Wikipedia 2. Google
The Items Dataset
zenodo.org
Updated Nov 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Egan; Patrick Egan (2024). The Items Dataset [Dataset]. http://doi.org/10.5281/zenodo.10964134
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10964134
Dataset updated
Nov 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Egan; Patrick Egan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset originally created 03/01/2019 UPDATE: Packaged on 04/18/2019 UPDATE: Edited README on 04/18/2019

I. About this Data Set This data set is a snapshot of work that is ongoing as a collaboration between Kluge Fellow in Digital Studies, Patrick Egan and an intern at the Library of Congress in the American Folklife Center. It contains a combination of metadata from various collections that contain audio recordings of Irish traditional music. The development of this dataset is iterative, and it integrates visualizations that follow the key principles of trust and approachability. The project, entitled, “Connections In Sound” invites you to use and re-use this data.

The text available in the Items dataset is generated from multiple collections of audio material that were discovered at the American Folklife Center. Each instance of a performance was listed and “sets” or medleys of tunes or songs were split into distinct instances in order to allow machines to read each title separately (whilst still noting that they were part of a group of tunes). The work of the intern was then reviewed before publication, and cross-referenced with the tune index at www.irishtune.info. The Items dataset consists of just over 1000 rows, with new data being added daily in a separate file.

The collections dataset contains at least 37 rows of collections that were located by a reference librarian at the American Folklife Center. This search was complemented by searches of the collections by the scholar both on the internet at https://catalog.loc.gov and by using card catalogs.

Updates to these datasets will be announced and published as the project progresses.

II. What’s included? This data set includes:

The Items Dataset – a .CSV containing Media Note, OriginalFormat, On Website, Collection Ref, Missing In Duplication, Collection, Outside Link, Performer, Solo/multiple, Sub-item, type of tune, Tune, Position, Location, State, Date, Notes/Composer, Potential Linked Data, Instrument, Additional Notes, Tune Cleanup. This .CSV is the direct export of the Items Google Spreadsheet

III. How Was It Created? These data were created by a Kluge Fellow in Digital Studies and an intern on this program over the course of three months. By listening, transcribing, reviewing, and tagging audio recordings, these scholars improve access and connect sounds in the American Folklife Collections by focusing on Irish traditional music. Once transcribed and tagged, information in these datasets is reviewed before publication.

IV. Data Set Field Descriptions

IV

a) Collections dataset field descriptions

ItemId – this is the identifier for the collection that was found at the AFC

Viewed – if the collection has been viewed, or accessed in any way by the researchers.

On LOC – whether or not there are audio recordings of this collection available on the Library of Congress website.

On Other Website – if any of the recordings in this collection are available elsewhere on the internet

Original Format – the format that was used during the creation of the recordings that were found within each collection

Search – this indicates the type of search that was performed in order that resulted in locating recordings and collections within the AFC

Collection – the official title for the collection as noted on the Library of Congress website

State – The primary state where recordings from the collection were located

Other States – The secondary states where recordings from the collection were located

Era / Date – The decade or year associated with each collection

Call Number – This is the official reference number that is used to locate the collections, both in the urls used on the Library website, and in the reference search for catalog cards (catalog cards can be searched at this address: https://memory.loc.gov/diglib/ihas/html/afccards/afccards-home.html)

Finding Aid Online? – Whether or not a finding aid is available for this collection on the internet

b) Items dataset field descriptions

id – the specific identification of the instance of a tune, song or dance within the dataset

Media Note – Any information that is included with the original format, such as identification, name of physical item, additional metadata written on the physical item

Original Format – The physical format that was used when recording each specific performance. Note: this field is used in order to calculate the number of physical items that were created in each collection such as 32 wax cylinders.

On Webste? – Whether or not each instance of a performance is available on the Library of Congress website

Collection Ref – The official reference number of the collection

Missing In Duplication – This column marks if parts of some recordings had been made available on other websites, but not all of the recordings were included in duplication (see recordings from Philadelphia Céilí Group on Villanova University website)

Collection – The official title of the collection given by the American Folklife Center

Outside Link – If recordings are available on other websites externally

Performer – The name of the contributor(s)

Solo/multiple – This field is used to calculate the amount of solo performers vs group performers in each collection

Sub-item – In some cases, physical recordings contained extra details, the sub-item column was used to denote these details

Type of item – This column describes each individual item type, as noted by performers and collectors

Item – The item title, as noted by performers and collectors. If an item was not described, it was entered as “unidentified”

Position – The position on the recording (in some cases during playback, audio cassette player counter markers were used)

Location – Local address of the recording

State – The state where the recording was made

Date – The date that the recording was made

Notes/Composer – The stated composer or source of the item recorded

Potential Linked Data – If items may be linked to other recordings or data, this column was used to provide examples of potential relationships between them

Instrument – The instrument(s) that was used during the performance

Additional Notes – Notes about the process of capturing, transcribing and tagging recordings (for researcher and intern collaboration purposes)

Tune Cleanup – This column was used to tidy each item so that it could be read by machines, but also so that spelling mistakes from the Item column could be corrected, and as an aid to preserving iterations of the editing process

V. Rights statement The text in this data set was created by the researcher and intern and can be used in many different ways under creative commons with attribution. All contributions to Connections In Sound are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creators of these datasets.

VI. Creator and Contributor Information

Creator: Connections In Sound

Contributors: Library of Congress Labs

VII. Contact Information Please direct all questions and comments to Patrick Egan via www.twitter.com/drpatrickegan or via his website at www.patrickegan.org. You can also get in touch with the Library of Congress Labs team via LC-Labs@loc.gov.
U
Puerto Rico Natural Hazards Webpage Visits
data.usgs.gov
s.cnmilf.com
+1more
Updated Mar 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priscila Vargas-Babilonia; Dawn Kotowicz; Legna Torres-Garcia; Ana Valdes-Calderon (2023). Puerto Rico Natural Hazards Webpage Visits [Dataset]. http://doi.org/10.5066/P1DREUMZ
Explore at:
Unique identifier
https://doi.org/10.5066/P1DREUMZ
Dataset updated
Mar 31, 2023
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Priscila Vargas-Babilonia; Dawn Kotowicz; Legna Torres-Garcia; Ana Valdes-Calderon
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jul 19, 2022 - Mar 31, 2023
Area covered
Puerto Rico
Description
Underserved communities, especially those in coastal areas in Puerto Rico, face significant threats from natural hazards such as hurricanes and rising sea levels. Limited funding hinders the investment in costly mitigation measures, increasing exposure to natural disasters. Providing coastal resources and data products through effective communication mechanisms is fundamental to improving the well-being of these underserved coastal communities. The overall objectives of the pilot effort to engage and connect with underrepresented coastal communities in Puerto Rico were the following: (1) compile a comprehensive database of the projects and resources relevant to natural hazards in Puerto Rico; (2) foster connections with Puerto Rican interested parties to better understand their priorities regarding coastal hazards and provide them with pertinent U.S. Geological Survey (USGS) resources; and (3) identify knowledge gaps to guide future USGS projects in Puerto Rico. As a result of thi ...

Facebook

Twitter

Click to copy link

Link copied

Cite

data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic

Open Data Website Traffic

Explore at:

Dataset updated

Jun 21, 2025

Dataset provided by

data.lacity.org

Description

Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly

Clear search

Close search

Google apps

Main menu

Open Data Website Traffic

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Daily website visitors (time series regression)

Context

Content

Inspiration

Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...

Google Analytics Sample

Monthly Page Views to CDC.gov

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

Swash User Search and Consumer Journey Data - 1.5M Worldwide Users - GDPR...

Website Metrics

E-commerce - Users of a French C2C fashion store

Foreword

Context

Content

Acknowledgements

Inspiration

License

FOI-01782 - Datasets - Open Data Portal

Exhibit of Datasets

Context Ad Clicks Dataset

Context

Content

Inspiration

Calderdale Data Works Webstats

A web tracking data set of online browsing behavior of 2,148 users

Meteorite strikes

Wikipedia Web Traffic 2018-19

Context

Content

Acknowledgements

The Items Dataset

Puerto Rico Natural Hazards Webpage Visits

Open Data Website Traffic