30 datasets found

Website Traffic
kaggle.com
zip
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion
Explore at:
zip(65228 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
AnthonyTherrien
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.

Session Duration: The total duration of the session in minutes.

Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.

Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).

Time on Page: The amount of time spent on the specific page.

Previous Visits: The number of previous visits by the same visitor.

Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000

Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.

Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.

Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.

Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.

Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.

Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.

Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.

Analyzing the effectiveness of different traffic sources.

Understanding user behavior patterns and optimizing the website accordingly.

Improving marketing strategies based on traffic source performance.

Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Daily website visitors (time series regression)
kaggle.com
zip
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors
Explore at:
zip(35736 bytes)Available download formats
Dataset updated
Aug 20, 2020
Authors
Bob Nau
Description
Context

This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

Content

The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

Inspiration

This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

Recipe Site Traffic: Analysis & Prediction

kaggle.com

Updated Sep 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 21, 2025

Dataset provided by

Kaggle

Authors

Michael Matta

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
PPTX Presentation

Recipe Site Traffic

From: Head of Data Science
Received: Today
Subject: New project from the product team

Hey!

I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

You can find more details about what I expect you to do here. And information on the data here.

I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

Good Luck!

From: Product Manager - Recipe Discovery
To: Head of Data Science
Received: Yesterday
Subject: Can you help us predict popular recipes?

Hi,

We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

Look forward to seeing your presentation.

About Tasty Bytes

Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

Example Recipe

This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

Tomato Soup

Servings: 4
Time to make: 2 hours
Category: Lunch/Snack
Cost per serving: $

Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

Method: 1. Cut the tomatoes into quarters….

Data Information

The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

As you will see, they haven't given us all of the information they have about each recipe.

You can find the data here.

I will let you decide how to process it, just make sure you include all your decisions in your report.

Don't forget to double check the data really does match what they say - it might not.

Column Name	Details
recipe	Numeric, unique identifier of recipe
calories	Numeric, number of calories
carbohydrate	Numeric, amount of carbohydrates in grams
sugar	Numeric, amount of sugar in grams
protein	Numeric, amount of prote...

r
Walmart.com Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Walmart.com Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-walmart-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2020 - 2025
Area covered
United States
Variables measured
Daily website visits, Session duration metrics, Traffic source breakdown, Geographic traffic patterns, Seasonal traffic variations, Mobile vs desktop traffic distribution
Description
Comprehensive dataset analyzing Walmart.com's daily website traffic, including 16.7 million daily visits, device distribution, geographic patterns, and competitive benchmarking data.
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
r
Amazon Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Amazon Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-amazon-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2019 - 2025
Area covered
Global
Variables measured
Daily website visits, Monthly traffic volume, Geographic distribution, Seasonal traffic patterns, Traffic sources breakdown, Mobile vs desktop traffic split
Description
Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.
s
Traffic Exchange Analysis Dataset 2024
sparktraffic.com
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SparkTraffic (2024). Traffic Exchange Analysis Dataset 2024 [Dataset]. https://www.sparktraffic.com/blog/reason-not-to-use-traffic-exchanges
Explore at:
Dataset updated
Jun 10, 2024
Dataset authored and provided by
SparkTraffic
Description
Research data on traffic exchange limitations including low-quality traffic characteristics, search engine penalty risks, and comparison with effective alternatives like SEO and content marketing strategies.
s
Ardgillan Demesne Traffic Data 2018-2023 FCC - Dataset - data.smartdublin.ie...
data.smartdublin.ie
Updated Nov 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Ardgillan Demesne Traffic Data 2018-2023 FCC - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/ardgillan-demesne-traffic-data-2018-2023-fcc2
Explore at:
Dataset updated
Nov 9, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ardgillan Demesne
Description
Data on Traffic Volume entering to Ardgillan Demesne - 2018 to 2023 - see new 2024 onward data setArdgillan park is unique among Dublin’s regional parks for the magnificent views it enjoys of the coastline. A panorama, taking in Rockabill Lighthouse, Colt Church, Shenick and Lambay Islands may be seen, including Sliabh Foy, the highest of the Cooley Mountains, and of course the Mourne Mountains can be seen sweeping down to the sea.The park area is the property of Fingal County Council and was opened to the public as a regional park in June 1985. Preliminary works were carried out prior to the opening in order to transform what had been an arable farm, into a public park. Five miles of footpaths were provided throughout the demesne, some by opening old avenues, while others were newly constructed. They now provide a system of varied and interesting woodland, walks and vantage points from which to enjoy breath-taking views of the sea, the coastline and surrounding countryside. A signposted cycle route through the park since June 2009 means that cyclists can share the miles of walking paths with pedestriansAttractions within the DemesnePlay GroundRose GardensFair TrailPollinator Areas ( Approx. 40 Acres on whole Demesne)CafeCycle Track Walking Routes See further details on web site www.ardgillancastle.ie/
Personal Ecommerce Website Ad cost & viewer count
kaggle.com
zip
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Micheal_Knight (2025). Personal Ecommerce Website Ad cost & viewer count [Dataset]. https://www.kaggle.com/datasets/michealknight/personal-ecommerce-website-ad-cost-and-viewer-count
Explore at:
zip(29323 bytes)Available download formats
Dataset updated
Apr 18, 2025
Authors
Micheal_Knight
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📊 Dataset Description: Daily Website Traffic and Engagement Metrics

This dataset contains daily web traffic and user engagement information for a live website, recorded over an extended period. It provides a comprehensive view of how user activity on the platform varies in response to marketing initiatives and temporal factors such as weekends and holidays.

The dataset is particularly suited for time series forecasting, seasonality analysis, and marketing effectiveness studies. It is valuable for both academic and practical applications in fields such as digital analytics, marketing strategy, and predictive modeling.

🧾 Use Case Scenarios:

Forecasting future page views using past behavior and external influencing factors

Evaluating the impact of advertising spend on web traffic and ROI

Detecting seasonality and weekly/cyclical patterns in user engagement

Developing time-aware models for resource planning (e.g., server load, content drops)

Training and benchmarking time series models such as ARIMA, SARIMA, RNN, LSTM, and GRU
RÉ Logs Dataset
zenodo.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig (2025). RÉ Logs Dataset [Dataset]. http://doi.org/10.5281/zenodo.17249231
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.17249231
Dataset updated
Oct 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Collation of data from Radio Éireann log books, at RTÉ, Donnybrook, Dublin 4.

Dataset originally created 2016 UPDATE: Packaged on 02/10/2025

I. About this Data Set

This data set is a result of close reading conducted by Patrick Egan (Pádraig Mac Aodhgáin) at Radio Teilifís Éireann log books relating to Seán Ó Riada.

Research was conducted between 2014-2018. It contains a combination of metadata from searches of the Boole Library catalogue and Seán Ó Riada Collection finding aid (or "descriptive list"), relating to music-related projects that were involving Seán Ó Riada. The PhD project was published in 2020, entitled, “Exploring ethnography and digital visualisation: a study of musical practice through the contextualisation of music related projects from the Seán Ó Riada Collection”, and a full listing of radio broadcasts is added to the dataset named "The Ó Riada Projects" at https://doi.org/10.5281/zenodo.15348617

You are invited to use and re-use this data with appropriate attribution.

The "RÉ Logs Dataset" dataset consists of 90 rows.

II. What’s included? This data set includes:

A search of log books of radio broadcasts to find all instances of shows that involved Seán Ó Riada.

III. How Was It Created? These data were created by daily visits to Radio Teilifís Éireann in Dublin, Ireland.

IV. Data Set Field Descriptions

Column headings have not been added to the dataset.

Column A - blank
Column B - type of broadcast
Column C - blank
Column D - date of broadcast
Column E - blank
Column F - blank
Column G - blank
Column H - blank
Column I - description of broadcast
Column J - blank
Column K - blank
Column J - length of broadcast

V. Rights statement The text in this data set was created by the researcher and can be used in many different ways under creative commons with attribution. All contributions to this PhD project are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creator of this dataset.

VI. Creator and Contributor Information

Creator: Patrick Egan (Pádraig Mac Aodhgáin)

VII. Contact Information Please direct all questions and comments to Patrick Egan via his website at www.patrickegan.org. You can also get in touch with the Library via UCC website.
g
Michigan Public Policy Survey Restricted Use Datasets
datasearch.gesis.org
Updated Aug 27, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Restricted Use Datasets [Dataset]. http://doi.org/10.3886/E55175V2
Explore at:
Unique identifier
https://doi.org/10.3886/E55175V2
Dataset updated
Aug 27, 2016
Dataset provided by
da|ra (Registration agency for social science and economic data)
Authors
Center for Local, State, and Urban Policy
Area covered
Michigan
Description
The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. The MPPS datasets are being released in two forms: public-use datasets and restricted-use datasets. Unlike the public-use datasets, the restricted-use datasets represent full MPPS survey waves, and include all of the survey questions from a wave. Restricted-use datasets also allow for multiple waves to be linked together for longitudinal analysis. The MPPS staff do still modify these restricted-use datasets to remove jurisdiction and respondent identifiers and to recode other variables in order to protect confidentiality. However, it is theoretically possible that a researcher might be able, in some rare cases, to use enough variables from a full dataset to identify a unique jurisdiction, so access to these datasets is restricted and approved on a case-by-case basis. CLOSUP encourages researchers interested in the MPPS to review the codebooks included in this data collection to see the full list of variables including those not found in the public-use datasets, and to explore the MPPS data using the public-use datasets. On 2016-08-20, the openICPSR web site was moved to new software. In the migration process, some projects were not published in the new system because the decisions made in the old site did not map easily to the new setup. This project is temporarily available as restricted data while ICPSR verifies that all files were migrated correctly.
Riga Data Science Club
kaggle.com
zip
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry Yemelyanov (2021). Riga Data Science Club [Dataset]. https://www.kaggle.com/datasets/dmitryyemelyanov/rigadsclub
Explore at:
zip(494849 bytes)Available download formats
Dataset updated
Mar 29, 2021
Authors
Dmitry Yemelyanov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Riga
Description
Context

Riga Data Science Club is a non-profit organisation to share ideas, experience and build machine learning projects together. Data Science community should known own data, so this is a dataset about ourselves: our website analytics, social media activity, slack statistics and even meetup transcriptions!

Content

Dataset is split up in several folders by the context: * linkedin - company page visitor, follower and post stats * slack - messaging and member activity * typeform - new member responses * website - website visitors by country, language, device, operating system, screen resolution * youtube - meetup transcriptions

Inspiration

Let's make Riga Data Science Club better! We expect this data to bring lots of insights on how to improve.

"Know your c̶u̶s̶t̶o̶m̶e̶r̶ member" - Explore member interests by analysing sign-up survey (typeform) responses - Explore messaging patterns in Slack to understand how members are retained and when they are lost

Social media intelligence * Define LinkedIn posting strategy based on historical engagement data * Define target user profile based on LinkedIn page attendance data

Website * Define website localisation strategy based on data about visitor countries and languages * Define website responsive design strategy based on data about visitor devices, operating systems and screen resolutions

Have some fun * NLP analysis of meetup transcriptions: word frequencies, question answering, something else?
Z
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor (2025). PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13763755
Explore at:
Dataset updated
Mar 17, 2025
Dataset provided by
University of California, San Diego
UCSD
Authors
Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing. Refer to our paper for more information, and our GitHub repository for any code-related details. Please cite both our paper and our collaborators' paper if you use this dataset (see our GitHub for more information).

Upon further use of the PDMX dataset, we discovered a discrepancy between the public-facing copyright metadata on the MuseScore website and the internal copyright data of the MuseScore files themselves, which affected 31,221 (12.29% of) songs. We have decided to proceed with the former given its public visibility on Musescore (i.e. this is what the MuseScore website presents its users with). We have noted files with conflicting internal licenses in the license_conflict column of PDMX. We recommend using the no_license_conflict subset of PDMX (which still includes 222,856 songs) moving forward.

Additionally, for each song in PDMX, we not only provide the MusicRender and metadata JSON files, but we also try to include the associated compressed MusicXML (MXL), sheet music (PDF), and MIDI (MID) files when available. Due to the corruption of 42 of the original MuseScore files, these songs lack those associated files (since they could not be converted to those formats) and only include the MusicRender and metadata JSON files. The all_valid subset of PDMX describes the songs where all associated files are valid.
website_visit_webalizer
kaggle.com
zip
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin ÇOBAN (2024). website_visit_webalizer [Dataset]. https://www.kaggle.com/datasets/erinoban/website-visit-webalizer
Explore at:
zip(1082 bytes)Available download formats
Dataset updated
Mar 24, 2024
Authors
Erin ÇOBAN
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset was obtained from website visit data. These are real data. It contains monthly visit information of the tr-metaverse.com website hosted on Linux. Day Hit Hit% Files Files% Pages Pages% Visit Visit% Sites Sites% Kbytes Kbytes% It consists of fields. Values with a % sign next to them are numbers in percent. 30-day visit data from the beginning of the month to the end of the month. Day: Day index number, which day of the month Hit: How much reach there is in general Hit%: How much access there is overall in percentage Files: How many visits have been made as files Files%: Percentage in files Pages Pages% Visit: Number of unique visitors Visit%: Unique visitor rate sites sites% Kbytes: how much data has been downloaded Kbytes%: percentage in data
d
Top-1000 HHS Open Data Resources
catalog.data.gov
data.virginia.gov
+1more
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Chief Data Officer (2025). Top-1000 HHS Open Data Resources [Dataset]. https://catalog.data.gov/dataset/top-1000-hhs-open-data-resources
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Office of Chief Data Officer
Description
HHS responsibly shares “open by default” data with the public to democratize access to information, demystify the Department, and increase transparency through data sharing. HHS Open Data is non-sensitive data, meaning thousands of health and human services datasets are publicly available to fuel new business models, enable emerging technologies like AI, accelerate scientific discoveries, and inspire American innovation. This top-1000 HHS Open Data websites and resources page, dynamically generated from the Digital Analytics Program (DAP) provided by the U.S. General Services Administration (GSA), is driven by near-real-time user demand. GSA’s DAP helps federal agencies and the public see how visitors find, access, and use government websites, data, and services online. The below list filters DAP for only resources from HHS and includes all HHS Divisions. You may filter by individual HHS Divisions and columns.
Customer propensity to purchase dataset
kaggle.com
zip
Updated Jun 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben P (2018). Customer propensity to purchase dataset [Dataset]. https://www.kaggle.com/datasets/benpowis/customer-propensity-to-purchase-data
Explore at:
zip(13598472 bytes)Available download formats
Dataset updated
Jun 1, 2018
Authors
Ben P
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

You get many visitors to your website every day, but you know only a small percentage of them are likely to buy from you, while most will perhaps not even return. Right now you may be spending money to re-market to everyone, but perhaps we could use machine learning to identify the most valuable prospects?

Content

This data set represents a day's worth of visit to a fictional website. Each row represents a unique customer, identified by their unique UserID. The columns represent feature of the users visit (such as the device they were using) and things the user did on the website in that day. These features will be different for every website, but in this data a few of the features we consider are: - basket_add_detail: Did the customer add a product to their shopping basket from the product detail page? - sign_in: Did the customer sign in to the website? - saw_homepage: Did the customer visit the website's homepage? - returning_user: Is this visitor new, or returning?

In this data set we also have a feature showing whether the customer placed an order (ordered), which is what we predict on.
TED Talks Dataset with Transcripts, LIWC, MFT
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). TED Talks Dataset with Transcripts, LIWC, MFT [Dataset]. https://www.kaggle.com/datasets/thedevastator/ted-talks-dataset-with-transcripts-liwc-mft
Explore at:
zip(10337474 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
The Devastator
Description
TED Talks Dataset with Transcripts, LIWC, MFT

TED Talks Dataset with Transcripts, LIWC, MFT and Views

By Owen Temple [source]

About this dataset

The TED Talks Dataset with Transcripts, LIWC, and MFT is a comprehensive collection of TED Talks from official events available on the TED.com website. The dataset includes various information about each talk, such as unique IDs, speaker names, headlines, URLs to view the videos, descriptions of the talks, and details about when and where they were filmed. It also includes the duration of each talk, the date it was published on TED.com, and topic tags that provide insight into the themes or subjects covered in each talk.

An expanded version of this dataset offers additional columns that provide even more valuable insights. For example, it includes full English transcripts of the talks for further analysis. The dataset also provides information on how many times each video has been viewed as of June 16th, 2017. Furthermore, Linguistic Inquiry and Word Count (LIWC) software was used to analyze these transcripts and generate variables that indicate word usage in different categories relative to the total number of words in each talk.

This expanded version of the dataset contains an extensive data dictionary that explains each variable created by LIWC software. The LIWC analysis offers insights into language patterns found within these TED Talks. Additionally,the transcripts were analyzed using a dictionary developed specifically for studying Moral Foundations Theory (MFT). This analysis provides proportions indicating use of virtue and vice words for different moral foundations within any given corpus.

This dataset covers all talks from official TED events made available on their website starting from its launch through June 13th , 2017.The provided visualization created by Sean Miller showcases this data effectively.

In addition to using this dataset for analyses or tracking which TED Talks you have seen,it can be utilized to build personal learning programs centered around specific topics covered in these engaging talks.

Overall,this Kaggle dataset is an invaluable resource for researchers,discussion groups,and individuals interested in exploring ideas shared through TED Talks utilizing detailed information including transcripts,Linguistic Inquiry and Word Count software analysis,and Moral Foundations Theory analysis

How to use the dataset

Overview: This dataset contains comprehensive information about TED Talks from official events, including unique IDs, speaker names, headlines, URLs, descriptions, transcripts, month and year filmed, event details, duration of the talk in minutes and seconds (MM:SS format), date published and topic tags. It also includes additional columns with full English transcripts analyzed using Linguistic Inquiry Word Count (LIWC) software for word analysis based on categories. These categories are expressed as a ratio of certain types of words divided by the total number of words in the talk. The expanded version also includes information on the number of views as of June 13th 2017.

Accessing the Data: You can access the data from this dataset either by downloading it or accessing it through an appropriate platform like Kaggle.

Database Structure: The dataset is presented in CSV format with multiple columns providing different pieces of information about each TED Talk entry.

a) Unique ID: Each talk is assigned a unique ID.

b) URL: This column provides URLs to access the video presentations online.

c) Transcript URL: You can access full English transcripts by following links provided in this column.

d) Speaker Name: This column specifies the name(s)of speaker(s).

e) Headline: The headline gives you a brief idea about what each TED Talk is about.

f) Description: More detailed description regarding each talk can be found here.

g) Date Filmed (Month-Year): Specifies when talks were filmed

h) Event Details : Information regarding where/which event/talks originated

i ) Duration (MM:SS): Length/duration specification is given here for individual talks

j ) Date Published : Identifies original publication date

k ) Topic Tags: Provides keywords or tags corresponding to the main themes covered in each talk

Additionally, there are 111 more columns with full English transcripts, number of views as of June 13th, 2017 and variables generated by LIWC software. These variables express word usage as ratios for different categories.

Interpreting LIWC Variables...
E-commerce - Users of a French C2C fashion store
kaggle.com
zip
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store
Explore at:
zip(3283629 bytes)Available download formats
Dataset updated
Feb 24, 2024
Authors
Jeffrey Mvutu Mabilama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

Example works:

Report(s) made using SQL queries can be found on the data.world page of the dataset.

Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.
Data & Analytics Stats LinkedIn Company Page
kaggle.com
zip
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Peters (2024). Data & Analytics Stats LinkedIn Company Page [Dataset]. https://www.kaggle.com/mirkopeters/data-and-analytics-stats-linkedin-company-page
Explore at:
zip(689754 bytes)Available download formats
Dataset updated
Aug 13, 2024
Authors
Mirko Peters
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
LinkedIn Company Page Data - The Data Analytics Academy Dataset Overview This dataset contains detailed insights from The Data Analytics Academy's LinkedIn Company Page, including information on content performance, followers, and visitors. The data is sourced directly from our LinkedIn analytics and has been organized into CSV files for ease of use.

Files Included: Content Data: Performance metrics for posts and updates shared on our LinkedIn page. Followers Data: Demographics and growth metrics of our LinkedIn page followers. Visitors Data: Insights on page visitors, including demographics and engagement levels. Use Cases: Social Media Analytics: Analyze the performance of content and its reach among different demographics. Market Research: Understand audience demographics and how they engage with our page. Data Science Projects: Apply machine learning algorithms to predict content performance or audience growth. Acknowledgments This data is free to use for any purpose, including commercial use. However, if you use this dataset, please give credit to The Data Analytics Academy by mentioning us or linking to our LinkedIn page: The Data Analytics Academy.

Inspiration This dataset can be used to explore various aspects of LinkedIn analytics, such as identifying trends in audience engagement, understanding content performance, and predicting follower growth.

Facebook

Twitter

Click to copy link

Link copied

Cite

AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion

Website Traffic

Website Traffic and User Engagement Metrics

Explore at:

zip(65228 bytes)Available download formats

Dataset updated

Aug 5, 2024

Authors

AnthonyTherrien

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

Page Views: The number of pages viewed during a session.
Session Duration: The total duration of the session in minutes.
Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.
Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).
Time on Page: The amount of time spent on the specific page.
Previous Visits: The number of previous visits by the same visitor.
Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

Total Records: 2000
Total Features: 7

Key Features

Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.
Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.
Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.
Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.
Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.
Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.
Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

Identifying key drivers of engagement and conversion.
Analyzing the effectiveness of different traffic sources.
Understanding user behavior patterns and optimizing the website accordingly.
Improving marketing strategies based on traffic source performance.
Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.

Clear search

Close search

Google apps

Main menu

Website Traffic

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Daily website visitors (time series regression)

Context

Content

Inspiration

Recipe Site Traffic: Analysis & Prediction

Recipe Site Traffic

About Tasty Bytes

Example Recipe

Data Information

Walmart.com Daily Traffic Statistics 2025

Google Analytics Sample

Amazon Daily Traffic Statistics 2025

Traffic Exchange Analysis Dataset 2024

Ardgillan Demesne Traffic Data 2018-2023 FCC - Dataset - data.smartdublin.ie...

Personal Ecommerce Website Ad cost & viewer count

RÉ Logs Dataset

Michigan Public Policy Survey Restricted Use Datasets

Riga Data Science Club

Context

Content

Inspiration

PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...

website_visit_webalizer

Top-1000 HHS Open Data Resources

Customer propensity to purchase dataset

Context

Content

TED Talks Dataset with Transcripts, LIWC, MFT

TED Talks Dataset with Transcripts, LIWC, MFT

TED Talks Dataset with Transcripts, LIWC, MFT and Views

About this dataset

How to use the dataset

E-commerce - Users of a French C2C fashion store

Foreword

Context

Content

Acknowledgements

Inspiration

License

Data & Analytics Stats LinkedIn Company Page

Website Traffic

Website Traffic and User Engagement Metrics

Dataset Overview

Dataset Description

Data Summary

Key Features

Usage

Acknowledgments