65 datasets found

Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
d
Website Analytics
catalog.data.gov
data.nola.gov
+4more
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
data.nola.gov
Description
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
u
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=pl&inv=1&invt=Ab3yJQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=pl
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
g
Website Traffic Dataset
gts.ai
json
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Website Traffic Dataset [Dataset]. https://gts.ai/dataset-download/website-traffic-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Aug 23, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
Google Analytics 4 sample data
kaggle.com
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
preeti deshmukh (2023). Google Analytics 4 sample data [Dataset]. https://www.kaggle.com/datasets/pdaasha/ga4-obfuscated-sample-ecommerce-jan2021/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
preeti deshmukh
Description
Context Google Analytics 4 is Google analytics latest service that enables you to measure traffic and engagement across your website as well as app.

Content This is a sample dataset of GA4 for Google Merchendise store for the month of Jan 2021.

Acknowledgements This information is exported from the google bigquery public data set.

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:ga4_obfuscated_sample_ecommerce.events_*

Inspiration To study google analytics it is really very difficult to get sample data so here's making it easy for some in need of one.
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
GA data with json columns
kaggle.com
Updated Oct 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colin Pearse (2018). GA data with json columns [Dataset]. https://www.kaggle.com/datasets/colinpearse/ga-analytics-with-json-columns/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Colin Pearse
Description
Context

Making dataset "Google Analytics Customer Revenue Prediction" easier and quicker to parse.

Content

This is the same information as dataset "Google Analytics Customer Revenue Prediction" with the JSON columns expanded (flattened) into additional csv columns.

Acknowledgements

Thanks to the original dataset "Google Analytics Customer Revenue Prediction"; it's safe to say that without you I could not exist as a more reduced space but equally as informative dataset.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
g
Website Statistics (Archived) | gimi9.com
gimi9.com
Updated Feb 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Website Statistics (Archived) | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_website-statistics/
Explore at:
Dataset updated
Feb 6, 2018
Description
** Note: This dataset has been archived and is no longer being updated due to a change in analytics platform. You can find the new dataset relating to Website Statistics in the following link; https://lincolnshire.ckan.io/dataset/website-statistics ** This Website Statistics dataset has three resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file. - Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year. - Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year. - Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year. Note: The resources above show only UK users, and exclude API calls (automated requests for datasets). For further information, please contact the Lincolnshire County Council Open Data team.
March Madness Historical DataSet (2002 to 2025)
kaggle.com
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jonathan Pilafas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
d
Custom dataset from any website on the Internet
datarade.ai
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ScrapeLabs (2022). Custom dataset from any website on the Internet [Dataset]. https://datarade.ai/data-products/custom-dataset-from-any-website-on-the-internet-scrapelabs
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Sep 21, 2022
Dataset authored and provided by
ScrapeLabs
Area covered
Kazakhstan, Bulgaria, India, Tunisia, Argentina, Jordan, Lebanon, Turks and Caicos Islands, Guinea-Bissau, Aruba
Description
We'll extract any data from any website on the Internet. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.

Some common use cases our customers use the data for: • Data Analysis • Market Research • Price Monitoring • Sales Leads • Competitor Analysis • Recruitment

We can get data from websites with pagination or scroll, with captchas, and even from behind logins. Text, images, videos, documents.

Receive data in any format you need: Excel, CSV, JSON, or any other.
m
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
data.mendeley.com
data.niaid.nih.gov
+2more
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.17632/rs6jnrjfsx.1
Explore at:
Unique identifier
https://doi.org/10.17632/rs6jnrjfsx.1
Dataset updated
Jun 24, 2024
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Please cite the following paper when using this dataset:

N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693

Abstract

This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
T
Cincinnati Open Data Website Analytics
data.cincinnati-oh.gov
csv, xlsx, xml
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Cincinnati (2025). Cincinnati Open Data Website Analytics [Dataset]. https://data.cincinnati-oh.gov/w/a56y-rtyz/default?cur=qn8t2Wvumwb
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
City of Cincinnati
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Cincinnati
Description
Data Description: This data set provides all public datasets, links, documents and community created filters hosted on the City of Cincinnati's Open Data Portal.

Data Creation: This data set is maintained by the City of Cincinnati's Open Data host, Socrata.

Data Created By: Socrata

Refresh Frequency: Daily

Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this data set.

Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).

Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
insight_bench
huggingface.co
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ServiceNow (2024). insight_bench [Dataset]. https://huggingface.co/datasets/ServiceNow/insight_bench
Explore at:
Dataset updated
Jun 6, 2024
Dataset authored and provided by
ServiceNowhttp://servicenow.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Dataset Summary

[Paper][Website][Dataset] Insight-Bench is a benchmark dataset designed to evaluate end-to-end data analytics by evaluating agents' ability to perform comprehensive data analysis across diverse use cases, featuring carefully curated insights, an evaluation mechanism based on LLaMA-3-Eval or G-EVAL, and a data analytics agent, AgentPoirot.

1. Install the… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow/insight_bench.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
w
Dataset of book subjects that contain Data strategy : how to profit from a...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Data strategy : how to profit from a world of big data, analytics and the Internet of things [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Data+strategy+%3A+how+to+profit+from+a+world+of+big+data%2C+analytics+and+the+Internet+of+things&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is Data strategy : how to profit from a world of big data, analytics and the Internet of things. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
p
Web Analytics - Dataset - CKAN
ckan0.cf.opendata.inter.prod-toronto.ca
Updated Apr 3, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Web Analytics - Dataset - CKAN [Dataset]. https://ckan0.cf.opendata.inter.prod-toronto.ca/gl_ES/dataset/web-analytics
Explore at:
Dataset updated
Apr 3, 2014
Description
See Readme File This dataset contains web analytics (statistics) capturing visitors' usage of internet web pages published on the City of Toronto website toronto.ca.
📣 Ad Click Prediction Dataset
kaggle.com
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ciobanu Marius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About

This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

Features

id: Unique identifier for each user.

full_name: User's name formatted as "UserX" for anonymity.

age: Age of the user (ranging from 18 to 64 years).

gender: The gender of the user (categorized as Male, Female, or Non-Binary).

device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).

ad_position: The position of the ad on the webpage (Top, Side, Bottom).

browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).

time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).

click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

Goal

The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
Jordan, Latvia, Saint Vincent and the Grenadines, Monaco, Belarus, Jamaica, Liechtenstein, Russian Federation, India, Uzbekistan
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample

Google Analytics Sample

Google Analytics Sample (BigQuery)

Explore at:

19 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Sep 19, 2019

Dataset provided by

BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/

Authors

Google BigQuery

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?

Clear search

Close search

Google apps

Main menu

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Website Analytics

Website Statistics

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

Google Analytics Sample

Website Traffic Dataset

Google Analytics 4 sample data

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

GA data with json columns

Context

Content

Acknowledgements

Inspiration

Website Statistics (Archived) | gimi9.com

March Madness Historical DataSet (2002 to 2025)

Custom dataset from any website on the Internet

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

Cincinnati Open Data Website Analytics

insight_bench

Datasets for Sentiment Analysis

Dataset of book subjects that contain Data strategy : how to profit from a...

Web Analytics - Dataset - CKAN

📣 Ad Click Prediction Dataset

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

Google Analytics Sample

Google Analytics Sample (BigQuery)

Context

Content

Acknowledgements

Inspiration