23 datasets found
  1. Clickstream data for online shopping

    • kaggle.com
    Updated Apr 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long Luu (2021). Clickstream data for online shopping [Dataset]. https://www.kaggle.com/aeryss/clickstream-data-for-online-shopping/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Long Luu
    Description

    Dataset

    This dataset was created by Long Luu

    Contents

  2. c

    Clickstream for Online Shopping Dataset

    • cubig.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG, Clickstream for Online Shopping Dataset [Dataset]. https://cubig.ai/store/products/376/clickstream-for-online-shopping-dataset
    Explore at:
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Clickstream Data for Online Shopping is an e-commerce analysis dataset that summarizes user clickstream, product information, country, price, and other session-specific behavior data from April to August 2008 at an online shopping mall specializing in maternity clothing.

    2) Data Utilization (1) Clickstream Data for Online Shopping has characteristics that: • Each row contains 14 key variables: year, month, day, click order, country (by access IP), session ID, main category, product code, color, photo location, model photo type, price, category average price, page number, etc. • Data is configured to enable analysis of various consumer behaviors such as click flows for each session, product attributes, and country-specific access patterns. (2) Clickstream Data for Online Shopping can be used to: • Online Shopping Mall User Behavior Analysis: Using clickstream, session, and product information, you can analyze purchase conversion routes, popular products, and behavioral patterns by country and category. • Improve marketing strategies and UI/UX: analyze the relationship between product photo location, color, price, etc. and click behavior and apply to establish effective marketing strategies and improvement of shopping mall UI/UX.

  3. AI-Driven Consumer Behavior Dataset

    • kaggle.com
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). AI-Driven Consumer Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/ai-driven-consumer-behavior-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This AI-Driven Consumer Behavior Dataset captures key aspects of online shopping behavior, including purchase decisions, browsing activity, customer reviews, and demographic details. The dataset is designed for research in consumer behavior analysis, AI-driven recommendation systems, and digital marketing optimization.

    Key Features: ✔ Consumer Purchase Data – Tracks product purchases, prices, discounts, and payment methods. ✔ Clickstream Data – Includes browsing behavior, pages visited, session duration, and cart abandonment. ✔ Customer Reviews & Sentiments – Provides ratings, textual reviews, and sentiment analysis scores. ✔ Demographic Information – Includes age, gender, location, and income levels. ✔ Target Column (purchase_decision) – Indicates whether a customer completed a purchase (1) or not (0).

  4. h

    grass-clickstream-dataset

    • huggingface.co
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grass (2025). grass-clickstream-dataset [Dataset]. https://huggingface.co/datasets/GrassData/grass-clickstream-dataset
    Explore at:
    Dataset updated
    Aug 26, 2025
    Dataset authored and provided by
    Grass
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Grass Clickstream Dataset

      Wynd Labs
    

    This is the clickstream dataset produced by the team at Wynd Labs. The provided embeddings are an aggregate of clip embeddings produced by selected keyframes from the respective video. We aim that these embeddings be used for task-specific clustering and automatic segmentation. If it clips, it ships.

  5. d

    Swash User Search and Consumer Journey Data - 1.5M Worldwide Users - GDPR...

    • datarade.ai
    .csv, .xls
    Updated Jun 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swash (2023). Swash User Search and Consumer Journey Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/users-searching-data-on-top-search-engines
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset authored and provided by
    Swash
    Area covered
    Korea (Republic of), Taiwan, Panama, Honduras, Bangladesh, United States of America, Israel, Macao, Japan, Kuwait
    Description

    Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

    Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

    User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

    Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

    GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

    Market Intelligence and Consumer Behaviour: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

    High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

    Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

    Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.

  6. Clickstream 2008 E-commerce Dataset

    • kaggle.com
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dev Patel (2024). Clickstream 2008 E-commerce Dataset [Dataset]. https://www.kaggle.com/datasets/ddevvedd/e-commerce-2008/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dev Patel
    Description

    Dataset

    This dataset was created by Dev Patel

    Contents

  7. t

    Modeling online browsing and path analysis using clickstream data - Dataset...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Modeling online browsing and path analysis using clickstream data - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/modeling-online-browsing-and-path-analysis-using-clickstream-data
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Modeling online browsing and path analysis using clickstream data.

  8. Data from: Click stream dataset

    • kaggle.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raghu Mariswamegowda (2025). Click stream dataset [Dataset]. https://www.kaggle.com/datasets/raghumariswamegowda/click-stream-dataset/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raghu Mariswamegowda
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Raghu Mariswamegowda

    Released under Apache 2.0

    Contents

  9. Data ClickStream Banco Galicia 2019

    • kaggle.com
    zip
    Updated Aug 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Garcia Blanco (2019). Data ClickStream Banco Galicia 2019 [Dataset]. https://www.kaggle.com/fgarciablanco/data-clickstream-banco-galicia-2019
    Explore at:
    zip(212409866 bytes)Available download formats
    Dataset updated
    Aug 29, 2019
    Authors
    Federico Garcia Blanco
    Description

    Dataset

    This dataset was created by Federico Garcia Blanco

    Contents

  10. E-Shop Clothing Dataset

    • kaggle.com
    Updated Aug 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aditya Wisnugraha S
    Description

    Data description “e-shop clothing 2008”

    Variables:

    1. YEAR (2008)

    ========================================================

    1. MONTH -> from April (4) to August (8)

    ========================================================

    1. DAY -> day number of the month

    ========================================================

    1. ORDER -> sequence of clicks during one session

    ========================================================

    1. COUNTRY -> variable indicating the country of origin of the IP address with the following categories:

    1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

    ========================================================

    1. SESSION ID -> variable indicating session id (short record)

    ========================================================

    1. PAGE 1 (MAIN CATEGORY) -> concerns the main product category: 1-trousers 2-skirts 3-blouses 4-sale

    ========================================================

    1. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product (217 products)

    ========================================================

    1. COLOUR -> colour of product

    1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

    ========================================================

    1. LOCATION -> photo location on the page, the screen has been divided into six parts:

    1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

    ========================================================

    1. MODEL PHOTOGRAPHY -> variable with two categories:

    1-en face 2-profile

    ========================================================

    1. PRICE -> price in US dollars

    ========================================================

    1. PRICE 2 -> variable informing whether the price of a particular product is higher than the average price for the entire product category

    1-yes 2-no

    ========================================================

    1. PAGE -> page number within the e-store website (from 1 to 5)

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    I want to know how to solve this data regarding any problem (clustering, regression, classification, EDA)

    Source: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for+online+shopping

  11. i

    Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12 -...

    • rdm.inesctec.pt
    Updated Mar 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12 - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2018-004
    Explore at:
    Dataset updated
    Mar 6, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The Simple English Wikipedia Link Graph with Clickstream Transitions is a gzipped GML file representing the hyperlink graph of the Simple English Wikipedia. It was prepared using the "pagelinks" and "page" SQL dumps for 2019-01-01 and extended with an edge property called "transitions" based on the Clickstream dump for the English Wikipedia from 2018-12. It was designed to be used as a ground truth to evaluate node ranking metrics, like PageRank, but it can be useful for Network Science in general, or for Machine Learning and Information Retrieval to compute features over a medium-sized, complete Wikipedia link graph.

  12. d

    Datasys | Clickstream Data | Gamer Audiences (10M+ gamers | PC, console &...

    • data.datasys.com
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasys (2025). Datasys | Clickstream Data | Gamer Audiences (10M+ gamers | PC, console & mobile) [Dataset]. https://data.datasys.com/products/datasys-clickstream-data-gamer-audiences-10m-gamers-p-datasys
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Datasys
    Area covered
    Lebanon, Falkland Islands (Malvinas), Saudi Arabia, Peru, Thailand, China, Israel, Ecuador, North Korea, Bahamas
    Description

    Datasys Gamer Audiences dataset tracks 10M+ gaming consumers, including platform usage, time spent, and title engagement.

  13. Wikipedia Clickstream

    • figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ellery Wulczyn; Dario Taraborelli (2023). Wikipedia Clickstream [Dataset]. http://doi.org/10.6084/m9.figshare.1305770.v16
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Ellery Wulczyn; Dario Taraborelli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This project contains data sets containing counts of (referer, resource) pairs extracted from the request logs of Wikipedia. A referer is an HTTP header field that identifies the address of the webpage that linked to the resource being requested. The data shows how people get to a Wikipedia article and what links they click on. In other words, it gives a weighted network of articles, where each edge weight corresponds to how often people navigate from one page to another. For more information and documentation, see the link in the references section below.

  14. m

    Mobile Web Clickstream | 1st Party | 3B+ events verified, US consumers |...

    • omnitrafficdata.mfour.com
    • datarade.ai
    Updated Aug 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MFour (2021). Mobile Web Clickstream | 1st Party | 3B+ events verified, US consumers | Safari, Chrome, any iOS or Android [Dataset]. https://omnitrafficdata.mfour.com/products/mobile-web-clickstream-1st-party-3b-events-verified-us-mfour
    Explore at:
    Dataset updated
    Aug 1, 2021
    Dataset authored and provided by
    MFour
    Area covered
    United States
    Description

    This dataset encompasses mobile web clickstream behavior on any browser, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or path to purchase and consumer journey understanding. Full URL deliverable available including searches.

  15. m

    Data from: Data-driven E-commerce UI Personalization: Going Beyond Product...

    • data.mendeley.com
    Updated Dec 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Wasilewski (2023). Data-driven E-commerce UI Personalization: Going Beyond Product Recommendations [Dataset]. http://doi.org/10.17632/sxmgyvxpv9.1
    Explore at:
    Dataset updated
    Dec 29, 2023
    Authors
    Adam Wasilewski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset includes 1. online store customer behavior data (clickstream) from 1.04.-30.11.2023, used to cluster customers and evaluate the effectiveness of implemented modifications (catalog: learning-dataset) 2. clustering results to verify the effectiveness of implemented changes (catalog: clustering) 3. detailed data for calculation of macro-conversion indicators (catalog: macro-conversion-indicators) 3. detailed data for calculation of micro-conversion indicators (catalog: micro-conversion-indicators)

  16. f

    Image Dataset for Predicting Early Dropouts in DigitalLearning Platforms

    • figshare.com
    zip
    Updated Jan 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nishant Sharma; Manish Kumar Pandey; M Ali Akber Dewan (2025). Image Dataset for Predicting Early Dropouts in DigitalLearning Platforms [Dataset]. http://doi.org/10.6084/m9.figshare.28282832.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    figshare
    Authors
    Nishant Sharma; Manish Kumar Pandey; M Ali Akber Dewan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article presents a student click-stream database comprising of 120542 train images and 80362 test images where each directory contains two sub directories i.e. "Dropouts" and "NonDropouts" as two different classes.The original dataset was provided by KDD Cup Challenge 2015 in which the dataset was provided by chinese MOOC(Massive open online course) platform XuetangX. These samples have been acquired or captured through the clickstream activity/user activity on the platform. We transformed the KDD-Cup 2015 dataset into an image dataset. This transformation will enable the application of novel deep learning and computer vision techniques to develop more sustainable, accurate, and robust predictive models for identifying students at risk of dropping out and will enable MOOC platforms to design highly robust Early Warning Systems. Furthermore, this dataset will be made publicly available to the research community to advance interdisciplinary research at the intersection of education and computer vision.

  17. n

    DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS

    • narcis.nl
    • data.mendeley.com
    Updated Mar 13, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Constante, F (via Mendeley Data) (2019). DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS [Dataset]. http://doi.org/10.17632/8gx2fvg2k6.5
    Explore at:
    Dataset updated
    Mar 13, 2019
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Constante, F (via Mendeley Data)
    Description

    A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.

    Type Data : Structured Data : DataCoSupplyChainDataset.csv Unstructured Data : tokenized_access_logs.csv (Clickstream)

    Types of Products : Clothing , Sports , and Electronic Supplies

    Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv.

  18. Language Specific Event Recommendation Ground Truth

    • zenodo.org
    csv, txt
    Updated Dec 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Abdollahi; Sara Abdollahi; Simon Gottschalk; Simon Gottschalk; Elena Demidova; Elena Demidova (2021). Language Specific Event Recommendation Ground Truth [Dataset]. http://doi.org/10.5281/zenodo.5735580
    Explore at:
    txt, csvAvailable download formats
    Dataset updated
    Dec 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sara Abdollahi; Sara Abdollahi; Simon Gottschalk; Simon Gottschalk; Elena Demidova; Elena Demidova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a multilingual ground truth dataset for training, evaluating and testing the LaSER (Language-Specific Event Recommendation) model. It contains language-specific relevance scores for event-centric click-through pairs according to the publicly available Clickstream dataset in German, French and Russian as well as the user study annotations conducted for evaluating the language-specific recommendations by LaSER. For more details, refer to EventKG+Click and LaSER.

    This dataset consists of two sets of files as follows:
    1. The ground truth dataset that is used for training the learning to rank (LTR) model in LaSER in three languages. The following files contain the language-specific relevance scores between a source and target entity based on EventKG+Click dataset:

    • german_ground_truth.txt
    • french_ground_truth.txt
    • russian_ground_truth.txt

    In these files source and target represent the label of entities and events in the respective language.

    2. The second set contains the user study participants' annotations regarding different relevance criteria of recommended events by LaSER. The following three files contain the annotations of at least three participants per event:

    • german_user_study_annotations.csv
    • french_user_study_annotations.csv
    • russian_user_study_annotations.csv

    In these files, "r1", "r2" and "r3" denote relevance to the topic, language community and general audience respectively. And topic and event represent the wikidata-id of entities and events.

  19. Z

    Student oriented subset of the Open University Learning Analytics dataset

    • data.niaid.nih.gov
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriella Casalino (2021). Student oriented subset of the Open University Learning Analytics dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4264396
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset provided by
    Giovanna Castellano
    Gennaro Vessio
    Gabriella Casalino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.

    We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.

    Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.

    All features have been converted to numbers for automatic processing.

    Below is the mapping used to convert categorical values to numeric:

    code_module: 'AAA'=0, 'BBB'=1, 'CCC'=2, 'DDD'=3, 'EEE'=4, 'FFF'=5, 'GGG'=6

    code_presentation: '2013B'=0, '2013J'=1, '2014B'=2, '2014J'=3

    gender: 'F'=0, 'M'=1

    highest_education: 'No_Formal_quals'=0, 'Post_Graduate_Qualification'=1, 'HE_Qualification'=2, 'Lower_Than_A_Level'=3, 'A_level_or_Equivalent'=4

    IMBD_band: 'unknown'=0, 'between_0_and_10_percent'=1, 'between_10_and_20_percent'=2, 'between_20_and_30_percent'=3, 'between_30_and_40_percent'=4, 'between_40_and_50_percent'=5, 'between_50_and_60_percent'=6, 'between_60_and_70_percent'=7, 'between_70_and_80_percent'=8, 'between_80_and_90_percent'=9, 'between_90_and_100_percent'=10

    age_band: 'between_0_and_35'=0, 'between_35_and_55'=1, 'higher_than_55'=2

    disability: 'N'=0, 'Y'=1

    student's outcome: 'Fail'=0, 'Pass'=1

    For more detailed information, please refer to:

    Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1

  20. COVID-19 Pandemic Wikipedia Readership

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin (2023). COVID-19 Pandemic Wikipedia Readership [Dataset]. http://doi.org/10.6084/m9.figshare.14548032.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Long Luu (2021). Clickstream data for online shopping [Dataset]. https://www.kaggle.com/aeryss/clickstream-data-for-online-shopping/code
Organization logo

Clickstream data for online shopping

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Long Luu
Description

Dataset

This dataset was created by Long Luu

Contents

Search
Clear search
Close search
Google apps
Main menu