30 datasets found
  1. Website Traffic

    • kaggle.com
    zip
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion
    Explore at:
    zip(65228 bytes)Available download formats
    Dataset updated
    Aug 5, 2024
    Authors
    AnthonyTherrien
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

    Dataset Description

    • Page Views: The number of pages viewed during a session.
    • Session Duration: The total duration of the session in minutes.
    • Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.
    • Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).
    • Time on Page: The amount of time spent on the specific page.
    • Previous Visits: The number of previous visits by the same visitor.
    • Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

    Data Summary

    • Total Records: 2000
    • Total Features: 7

    Key Features

    1. Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.
    2. Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.
    3. Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.
    4. Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.
    5. Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.
    6. Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.
    7. Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

    Usage

    This dataset can be used for various analyses such as:

    • Identifying key drivers of engagement and conversion.
    • Analyzing the effectiveness of different traffic sources.
    • Understanding user behavior patterns and optimizing the website accordingly.
    • Improving marketing strategies based on traffic source performance.
    • Enhancing user experience by analyzing time spent on different pages.

    Acknowledgments

    This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.

  2. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  3. Daily website visitors (time series regression)

    • kaggle.com
    zip
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bob Nau (2020). Daily website visitors (time series regression) [Dataset]. https://www.kaggle.com/bobnau/daily-website-visitors
    Explore at:
    zip(35736 bytes)Available download formats
    Dataset updated
    Aug 20, 2020
    Authors
    Bob Nau
    Description

    Context

    This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.

    Content

    The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.

    Inspiration

    This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.

  4. Recipe Site Traffic: Analysis & Prediction

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Michael Matta
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

    Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
    PPTX Presentation

    Recipe Site Traffic

    From: Head of Data Science
    Received: Today
    Subject: New project from the product team

    Hey!

    I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

    I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

    They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

    You can find more details about what I expect you to do here. And information on the data here.

    I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

    Good Luck!

    From: Product Manager - Recipe Discovery
    To: Head of Data Science
    Received: Yesterday
    Subject: Can you help us predict popular recipes?

    Hi,

    We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

    At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

    Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

    We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

    Look forward to seeing your presentation.

    About Tasty Bytes

    Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

    Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

    Example Recipe

    This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

    Tomato Soup

    Servings: 4
    Time to make: 2 hours
    Category: Lunch/Snack
    Cost per serving: $

    Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

    Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

    Method: 1. Cut the tomatoes into quarters….

    Data Information

    The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

    As you will see, they haven't given us all of the information they have about each recipe.

    You can find the data here.

    I will let you decide how to process it, just make sure you include all your decisions in your report.

    Don't forget to double check the data really does match what they say - it might not.

    Column NameDetails
    recipeNumeric, unique identifier of recipe
    caloriesNumeric, number of calories
    carbohydrateNumeric, amount of carbohydrates in grams
    sugarNumeric, amount of sugar in grams
    proteinNumeric, amount of prote...
  5. r

    Walmart.com Daily Traffic Statistics 2025

    • redstagfulfillment.com
    html
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Red Stag Fulfillment (2025). Walmart.com Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-walmart-receive/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Red Stag Fulfillment
    Time period covered
    2020 - 2025
    Area covered
    United States
    Variables measured
    Daily website visits, Session duration metrics, Traffic source breakdown, Geographic traffic patterns, Seasonal traffic variations, Mobile vs desktop traffic distribution
    Description

    Comprehensive dataset analyzing Walmart.com's daily website traffic, including 16.7 million daily visits, device distribution, geographic patterns, and competitive benchmarking data.

  6. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  7. r

    Amazon Daily Traffic Statistics 2025

    • redstagfulfillment.com
    html
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Red Stag Fulfillment (2025). Amazon Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-amazon-receive/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Red Stag Fulfillment
    Time period covered
    2019 - 2025
    Area covered
    Global
    Variables measured
    Daily website visits, Monthly traffic volume, Geographic distribution, Seasonal traffic patterns, Traffic sources breakdown, Mobile vs desktop traffic split
    Description

    Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.

  8. s

    Traffic Exchange Analysis Dataset 2024

    • sparktraffic.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SparkTraffic (2024). Traffic Exchange Analysis Dataset 2024 [Dataset]. https://www.sparktraffic.com/blog/reason-not-to-use-traffic-exchanges
    Explore at:
    Dataset updated
    Jun 10, 2024
    Dataset authored and provided by
    SparkTraffic
    Description

    Research data on traffic exchange limitations including low-quality traffic characteristics, search engine penalty risks, and comparison with effective alternatives like SEO and content marketing strategies.

  9. s

    Ardgillan Demesne Traffic Data 2018-2023 FCC - Dataset - data.smartdublin.ie...

    • data.smartdublin.ie
    Updated Nov 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Ardgillan Demesne Traffic Data 2018-2023 FCC - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/ardgillan-demesne-traffic-data-2018-2023-fcc2
    Explore at:
    Dataset updated
    Nov 9, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ardgillan Demesne
    Description

    Data on Traffic Volume entering to Ardgillan Demesne - 2018 to 2023 - see new 2024 onward data setArdgillan park is unique among Dublin’s regional parks for the magnificent views it enjoys of the coastline. A panorama, taking in Rockabill Lighthouse, Colt Church, Shenick and Lambay Islands may be seen, including Sliabh Foy, the highest of the Cooley Mountains, and of course the Mourne Mountains can be seen sweeping down to the sea.The park area is the property of Fingal County Council and was opened to the public as a regional park in June 1985. Preliminary works were carried out prior to the opening in order to transform what had been an arable farm, into a public park. Five miles of footpaths were provided throughout the demesne, some by opening old avenues, while others were newly constructed. They now provide a system of varied and interesting woodland, walks and vantage points from which to enjoy breath-taking views of the sea, the coastline and surrounding countryside. A signposted cycle route through the park since June 2009 means that cyclists can share the miles of walking paths with pedestriansAttractions within the DemesnePlay GroundRose GardensFair TrailPollinator Areas ( Approx. 40 Acres on whole Demesne)CafeCycle Track Walking Routes See further details on web site www.ardgillancastle.ie/

  10. Personal Ecommerce Website Ad cost & viewer count

    • kaggle.com
    zip
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Micheal_Knight (2025). Personal Ecommerce Website Ad cost & viewer count [Dataset]. https://www.kaggle.com/datasets/michealknight/personal-ecommerce-website-ad-cost-and-viewer-count
    Explore at:
    zip(29323 bytes)Available download formats
    Dataset updated
    Apr 18, 2025
    Authors
    Micheal_Knight
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📊 Dataset Description: Daily Website Traffic and Engagement Metrics

    This dataset contains daily web traffic and user engagement information for a live website, recorded over an extended period. It provides a comprehensive view of how user activity on the platform varies in response to marketing initiatives and temporal factors such as weekends and holidays.

    The dataset is particularly suited for time series forecasting, seasonality analysis, and marketing effectiveness studies. It is valuable for both academic and practical applications in fields such as digital analytics, marketing strategy, and predictive modeling.

    🧾 Use Case Scenarios:

    • Forecasting future page views using past behavior and external influencing factors
    • Evaluating the impact of advertising spend on web traffic and ROI
    • Detecting seasonality and weekly/cyclical patterns in user engagement
    • Developing time-aware models for resource planning (e.g., server load, content drops)
    • Training and benchmarking time series models such as ARIMA, SARIMA, RNN, LSTM, and GRU
  11. RÉ Logs Dataset

    • zenodo.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig (2025). RÉ Logs Dataset [Dataset]. http://doi.org/10.5281/zenodo.17249231
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mac Aodhgáin Pádraig; Mac Aodhgáin Pádraig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Collation of data from Radio Éireann log books, at RTÉ, Donnybrook, Dublin 4.

    Dataset originally created 2016 UPDATE: Packaged on 02/10/2025

    I. About this Data Set

    This data set is a result of close reading conducted by Patrick Egan (Pádraig Mac Aodhgáin) at Radio Teilifís Éireann log books relating to Seán Ó Riada.

    Research was conducted between 2014-2018. It contains a combination of metadata from searches of the Boole Library catalogue and Seán Ó Riada Collection finding aid (or "descriptive list"), relating to music-related projects that were involving Seán Ó Riada. The PhD project was published in 2020, entitled, “Exploring ethnography and digital visualisation: a study of musical practice through the contextualisation of music related projects from the Seán Ó Riada Collection”, and a full listing of radio broadcasts is added to the dataset named "The Ó Riada Projects" at https://doi.org/10.5281/zenodo.15348617

    You are invited to use and re-use this data with appropriate attribution.

    The "RÉ Logs Dataset" dataset consists of 90 rows.

    II. What’s included? This data set includes:

    A search of log books of radio broadcasts to find all instances of shows that involved Seán Ó Riada.

    III. How Was It Created? These data were created by daily visits to Radio Teilifís Éireann in Dublin, Ireland.

    IV. Data Set Field Descriptions

    Column headings have not been added to the dataset.

    Column A - blank
    Column B - type of broadcast
    Column C - blank
    Column D - date of broadcast
    Column E - blank
    Column F - blank
    Column G - blank
    Column H - blank
    Column I - description of broadcast
    Column J - blank
    Column K - blank
    Column J - length of broadcast

    V. Rights statement The text in this data set was created by the researcher and can be used in many different ways under creative commons with attribution. All contributions to this PhD project are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creator of this dataset.

    VI. Creator and Contributor Information

    Creator: Patrick Egan (Pádraig Mac Aodhgáin)

    VII. Contact Information Please direct all questions and comments to Patrick Egan via his website at www.patrickegan.org. You can also get in touch with the Library via UCC website.

  12. g

    Michigan Public Policy Survey Restricted Use Datasets

    • datasearch.gesis.org
    Updated Aug 27, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Restricted Use Datasets [Dataset]. http://doi.org/10.3886/E55175V2
    Explore at:
    Dataset updated
    Aug 27, 2016
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Center for Local, State, and Urban Policy
    Area covered
    Michigan
    Description

    The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. The MPPS datasets are being released in two forms: public-use datasets and restricted-use datasets. Unlike the public-use datasets, the restricted-use datasets represent full MPPS survey waves, and include all of the survey questions from a wave. Restricted-use datasets also allow for multiple waves to be linked together for longitudinal analysis. The MPPS staff do still modify these restricted-use datasets to remove jurisdiction and respondent identifiers and to recode other variables in order to protect confidentiality. However, it is theoretically possible that a researcher might be able, in some rare cases, to use enough variables from a full dataset to identify a unique jurisdiction, so access to these datasets is restricted and approved on a case-by-case basis. CLOSUP encourages researchers interested in the MPPS to review the codebooks included in this data collection to see the full list of variables including those not found in the public-use datasets, and to explore the MPPS data using the public-use datasets. On 2016-08-20, the openICPSR web site was moved to new software. In the migration process, some projects were not published in the new system because the decisions made in the old site did not map easily to the new setup. This project is temporarily available as restricted data while ICPSR verifies that all files were migrated correctly.

  13. Riga Data Science Club

    • kaggle.com
    zip
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitry Yemelyanov (2021). Riga Data Science Club [Dataset]. https://www.kaggle.com/datasets/dmitryyemelyanov/rigadsclub
    Explore at:
    zip(494849 bytes)Available download formats
    Dataset updated
    Mar 29, 2021
    Authors
    Dmitry Yemelyanov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Riga
    Description

    Context

    Riga Data Science Club is a non-profit organisation to share ideas, experience and build machine learning projects together. Data Science community should known own data, so this is a dataset about ourselves: our website analytics, social media activity, slack statistics and even meetup transcriptions!

    Content

    Dataset is split up in several folders by the context: * linkedin - company page visitor, follower and post stats * slack - messaging and member activity * typeform - new member responses * website - website visitors by country, language, device, operating system, screen resolution * youtube - meetup transcriptions

    Inspiration

    Let's make Riga Data Science Club better! We expect this data to bring lots of insights on how to improve.

    "Know your c̶u̶s̶t̶o̶m̶e̶r̶ member" - Explore member interests by analysing sign-up survey (typeform) responses - Explore messaging patterns in Slack to understand how members are retained and when they are lost

    Social media intelligence * Define LinkedIn posting strategy based on historical engagement data * Define target user profile based on LinkedIn page attendance data

    Website * Define website localisation strategy based on data about visitor countries and languages * Define website responsive design strategy based on data about visitor devices, operating systems and screen resolutions

    Have some fun * NLP analysis of meetup transcriptions: word frequencies, question answering, something else?

  14. Z

    PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor (2025). PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13763755
    Explore at:
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    University of California, San Diego
    UCSD
    Authors
    Long, Phillip; Novack, Zachary; McAuley, Julian; Berg-Kirkpatrick, Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing. Refer to our paper for more information, and our GitHub repository for any code-related details. Please cite both our paper and our collaborators' paper if you use this dataset (see our GitHub for more information).

    Upon further use of the PDMX dataset, we discovered a discrepancy between the public-facing copyright metadata on the MuseScore website and the internal copyright data of the MuseScore files themselves, which affected 31,221 (12.29% of) songs. We have decided to proceed with the former given its public visibility on Musescore (i.e. this is what the MuseScore website presents its users with). We have noted files with conflicting internal licenses in the license_conflict column of PDMX. We recommend using the no_license_conflict subset of PDMX (which still includes 222,856 songs) moving forward.

    Additionally, for each song in PDMX, we not only provide the MusicRender and metadata JSON files, but we also try to include the associated compressed MusicXML (MXL), sheet music (PDF), and MIDI (MID) files when available. Due to the corruption of 42 of the original MuseScore files, these songs lack those associated files (since they could not be converted to those formats) and only include the MusicRender and metadata JSON files. The all_valid subset of PDMX describes the songs where all associated files are valid.

  15. website_visit_webalizer

    • kaggle.com
    zip
    Updated Mar 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin ÇOBAN (2024). website_visit_webalizer [Dataset]. https://www.kaggle.com/datasets/erinoban/website-visit-webalizer
    Explore at:
    zip(1082 bytes)Available download formats
    Dataset updated
    Mar 24, 2024
    Authors
    Erin ÇOBAN
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset was obtained from website visit data. These are real data. It contains monthly visit information of the tr-metaverse.com website hosted on Linux. Day Hit Hit% Files Files% Pages Pages% Visit Visit% Sites Sites% Kbytes Kbytes% It consists of fields. Values with a % sign next to them are numbers in percent. 30-day visit data from the beginning of the month to the end of the month. Day: Day index number, which day of the month Hit: How much reach there is in general Hit%: How much access there is overall in percentage Files: How many visits have been made as files Files%: Percentage in files Pages Pages% Visit: Number of unique visitors Visit%: Unique visitor rate sites sites% Kbytes: how much data has been downloaded Kbytes%: percentage in data

  16. d

    Top-1000 HHS Open Data Resources

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Chief Data Officer (2025). Top-1000 HHS Open Data Resources [Dataset]. https://catalog.data.gov/dataset/top-1000-hhs-open-data-resources
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Office of Chief Data Officer
    Description

    HHS responsibly shares “open by default” data with the public to democratize access to information, demystify the Department, and increase transparency through data sharing. HHS Open Data is non-sensitive data, meaning thousands of health and human services datasets are publicly available to fuel new business models, enable emerging technologies like AI, accelerate scientific discoveries, and inspire American innovation. This top-1000 HHS Open Data websites and resources page, dynamically generated from the Digital Analytics Program (DAP) provided by the U.S. General Services Administration (GSA), is driven by near-real-time user demand. GSA’s DAP helps federal agencies and the public see how visitors find, access, and use government websites, data, and services online. The below list filters DAP for only resources from HHS and includes all HHS Divisions. You may filter by individual HHS Divisions and columns.

  17. Customer propensity to purchase dataset

    • kaggle.com
    zip
    Updated Jun 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben P (2018). Customer propensity to purchase dataset [Dataset]. https://www.kaggle.com/datasets/benpowis/customer-propensity-to-purchase-data
    Explore at:
    zip(13598472 bytes)Available download formats
    Dataset updated
    Jun 1, 2018
    Authors
    Ben P
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    You get many visitors to your website every day, but you know only a small percentage of them are likely to buy from you, while most will perhaps not even return. Right now you may be spending money to re-market to everyone, but perhaps we could use machine learning to identify the most valuable prospects?

    Content

    This data set represents a day's worth of visit to a fictional website. Each row represents a unique customer, identified by their unique UserID. The columns represent feature of the users visit (such as the device they were using) and things the user did on the website in that day. These features will be different for every website, but in this data a few of the features we consider are: - basket_add_detail: Did the customer add a product to their shopping basket from the product detail page? - sign_in: Did the customer sign in to the website? - saw_homepage: Did the customer visit the website's homepage? - returning_user: Is this visitor new, or returning?

    In this data set we also have a feature showing whether the customer placed an order (ordered), which is what we predict on.

  18. TED Talks Dataset with Transcripts, LIWC, MFT

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). TED Talks Dataset with Transcripts, LIWC, MFT [Dataset]. https://www.kaggle.com/datasets/thedevastator/ted-talks-dataset-with-transcripts-liwc-mft
    Explore at:
    zip(10337474 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    TED Talks Dataset with Transcripts, LIWC, MFT

    TED Talks Dataset with Transcripts, LIWC, MFT and Views

    By Owen Temple [source]

    About this dataset

    The TED Talks Dataset with Transcripts, LIWC, and MFT is a comprehensive collection of TED Talks from official events available on the TED.com website. The dataset includes various information about each talk, such as unique IDs, speaker names, headlines, URLs to view the videos, descriptions of the talks, and details about when and where they were filmed. It also includes the duration of each talk, the date it was published on TED.com, and topic tags that provide insight into the themes or subjects covered in each talk.

    An expanded version of this dataset offers additional columns that provide even more valuable insights. For example, it includes full English transcripts of the talks for further analysis. The dataset also provides information on how many times each video has been viewed as of June 16th, 2017. Furthermore, Linguistic Inquiry and Word Count (LIWC) software was used to analyze these transcripts and generate variables that indicate word usage in different categories relative to the total number of words in each talk.

    This expanded version of the dataset contains an extensive data dictionary that explains each variable created by LIWC software. The LIWC analysis offers insights into language patterns found within these TED Talks. Additionally,the transcripts were analyzed using a dictionary developed specifically for studying Moral Foundations Theory (MFT). This analysis provides proportions indicating use of virtue and vice words for different moral foundations within any given corpus.

    This dataset covers all talks from official TED events made available on their website starting from its launch through June 13th , 2017.The provided visualization created by Sean Miller showcases this data effectively.

    In addition to using this dataset for analyses or tracking which TED Talks you have seen,it can be utilized to build personal learning programs centered around specific topics covered in these engaging talks.

    Overall,this Kaggle dataset is an invaluable resource for researchers,discussion groups,and individuals interested in exploring ideas shared through TED Talks utilizing detailed information including transcripts,Linguistic Inquiry and Word Count software analysis,and Moral Foundations Theory analysis

    How to use the dataset

    • Overview: This dataset contains comprehensive information about TED Talks from official events, including unique IDs, speaker names, headlines, URLs, descriptions, transcripts, month and year filmed, event details, duration of the talk in minutes and seconds (MM:SS format), date published and topic tags. It also includes additional columns with full English transcripts analyzed using Linguistic Inquiry Word Count (LIWC) software for word analysis based on categories. These categories are expressed as a ratio of certain types of words divided by the total number of words in the talk. The expanded version also includes information on the number of views as of June 13th 2017.

    • Accessing the Data: You can access the data from this dataset either by downloading it or accessing it through an appropriate platform like Kaggle.

    • Database Structure: The dataset is presented in CSV format with multiple columns providing different pieces of information about each TED Talk entry.

      a) Unique ID: Each talk is assigned a unique ID.

      b) URL: This column provides URLs to access the video presentations online.

      c) Transcript URL: You can access full English transcripts by following links provided in this column.

      d) Speaker Name: This column specifies the name(s)of speaker(s).

      e) Headline: The headline gives you a brief idea about what each TED Talk is about.

      f) Description: More detailed description regarding each talk can be found here.

      g) Date Filmed (Month-Year): Specifies when talks were filmed

      h) Event Details : Information regarding where/which event/talks originated

      i ) Duration (MM:SS): Length/duration specification is given here for individual talks

      j ) Date Published : Identifies original publication date

      k ) Topic Tags: Provides keywords or tags corresponding to the main themes covered in each talk

      Additionally, there are 111 more columns with full English transcripts, number of views as of June 13th, 2017 and variables generated by LIWC software. These variables express word usage as ratios for different categories.

    • Interpreting LIWC Variables...

  19. E-commerce - Users of a French C2C fashion store

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store
    Explore at:
    zip(3283629 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    Jeffrey Mvutu Mabilama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Foreword

    This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

    My Telegram bot will answer your queries and allow you to contact me.

    Context

    There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

    Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

    This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

    • For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

    If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

    This dataset is part of a preview of a much larger dataset. Please contact me for more.

    Content

    The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

    Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Questions you might want to answer using this dataset:

    • Are e-commerce users interested in social network feature ?
    • Are my users active enough (compared to those of this dataset) ?
    • How likely are people from other countries to sign up in a C2C website ?
    • How many users are likely to drop off after years of using my service ?

    Example works:

    • Report(s) made using SQL queries can be found on the data.world page of the dataset.
    • Notebooks may be found on the Kaggle page of the dataset.

    License

    CC-BY-NC-SA 4.0

    For other licensing options, contact me.

  20. Data & Analytics Stats LinkedIn Company Page

    • kaggle.com
    zip
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Peters (2024). Data & Analytics Stats LinkedIn Company Page [Dataset]. https://www.kaggle.com/mirkopeters/data-and-analytics-stats-linkedin-company-page
    Explore at:
    zip(689754 bytes)Available download formats
    Dataset updated
    Aug 13, 2024
    Authors
    Mirko Peters
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    LinkedIn Company Page Data - The Data Analytics Academy Dataset Overview This dataset contains detailed insights from The Data Analytics Academy's LinkedIn Company Page, including information on content performance, followers, and visitors. The data is sourced directly from our LinkedIn analytics and has been organized into CSV files for ease of use.

    Files Included: Content Data: Performance metrics for posts and updates shared on our LinkedIn page. Followers Data: Demographics and growth metrics of our LinkedIn page followers. Visitors Data: Insights on page visitors, including demographics and engagement levels. Use Cases: Social Media Analytics: Analyze the performance of content and its reach among different demographics. Market Research: Understand audience demographics and how they engage with our page. Data Science Projects: Apply machine learning algorithms to predict content performance or audience growth. Acknowledgments This data is free to use for any purpose, including commercial use. However, if you use this dataset, please give credit to The Data Analytics Academy by mentioning us or linking to our LinkedIn page: The Data Analytics Academy.

    Inspiration This dataset can be used to explore various aspects of LinkedIn analytics, such as identifying trends in audience engagement, understanding content performance, and predicting follower growth.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AnthonyTherrien (2024). Website Traffic [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/website-traffic/discussion
Organization logo

Website Traffic

Website Traffic and User Engagement Metrics

Explore at:
zip(65228 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
AnthonyTherrien
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Overview

This dataset provides detailed information on website traffic, including page views, session duration, bounce rate, traffic source, time spent on page, previous visits, and conversion rate.

Dataset Description

  • Page Views: The number of pages viewed during a session.
  • Session Duration: The total duration of the session in minutes.
  • Bounce Rate: The percentage of visitors who navigate away from the site after viewing only one page.
  • Traffic Source: The origin of the traffic (e.g., Organic, Social, Paid).
  • Time on Page: The amount of time spent on the specific page.
  • Previous Visits: The number of previous visits by the same visitor.
  • Conversion Rate: The percentage of visitors who completed a desired action (e.g., making a purchase).

Data Summary

  • Total Records: 2000
  • Total Features: 7

Key Features

  1. Page Views: This feature indicates the engagement level of the visitors by showing how many pages they visit during their session.
  2. Session Duration: This feature measures the length of time a visitor stays on the website, which can indicate the quality of the content.
  3. Bounce Rate: A critical metric for understanding user behavior. A high bounce rate may indicate that visitors are not finding what they are looking for.
  4. Traffic Source: Understanding where your traffic comes from can help in optimizing marketing strategies.
  5. Time on Page: This helps in analyzing which pages are retaining visitors' attention the most.
  6. Previous Visits: This can be used to analyze the loyalty of visitors and the effectiveness of retention strategies.
  7. Conversion Rate: The ultimate metric for measuring the effectiveness of the website in achieving its goals.

Usage

This dataset can be used for various analyses such as:

  • Identifying key drivers of engagement and conversion.
  • Analyzing the effectiveness of different traffic sources.
  • Understanding user behavior patterns and optimizing the website accordingly.
  • Improving marketing strategies based on traffic source performance.
  • Enhancing user experience by analyzing time spent on different pages.

Acknowledgments

This dataset was generated for educational purposes and is not from a real website. It serves as a tool for learning data analysis and machine learning techniques.

Search
Clear search
Close search
Google apps
Main menu