This dataset was created by Long Luu
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Clickstream Data for Online Shopping is an e-commerce analysis dataset that summarizes user clickstream, product information, country, price, and other session-specific behavior data from April to August 2008 at an online shopping mall specializing in maternity clothing.
2) Data Utilization (1) Clickstream Data for Online Shopping has characteristics that: • Each row contains 14 key variables: year, month, day, click order, country (by access IP), session ID, main category, product code, color, photo location, model photo type, price, category average price, page number, etc. • Data is configured to enable analysis of various consumer behaviors such as click flows for each session, product attributes, and country-specific access patterns. (2) Clickstream Data for Online Shopping can be used to: • Online Shopping Mall User Behavior Analysis: Using clickstream, session, and product information, you can analyze purchase conversion routes, popular products, and behavioral patterns by country and category. • Improve marketing strategies and UI/UX: analyze the relationship between product photo location, color, price, etc. and click behavior and apply to establish effective marketing strategies and improvement of shopping mall UI/UX.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This AI-Driven Consumer Behavior Dataset captures key aspects of online shopping behavior, including purchase decisions, browsing activity, customer reviews, and demographic details. The dataset is designed for research in consumer behavior analysis, AI-driven recommendation systems, and digital marketing optimization.
Key Features: ✔ Consumer Purchase Data – Tracks product purchases, prices, discounts, and payment methods. ✔ Clickstream Data – Includes browsing behavior, pages visited, session duration, and cart abandonment. ✔ Customer Reviews & Sentiments – Provides ratings, textual reviews, and sentiment analysis scores. ✔ Demographic Information – Includes age, gender, location, and income levels. ✔ Target Column (purchase_decision) – Indicates whether a customer completed a purchase (1) or not (0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Grass Clickstream Dataset
Wynd Labs
This is the clickstream dataset produced by the team at Wynd Labs. The provided embeddings are an aggregate of clip embeddings produced by selected keyframes from the respective video. We aim that these embeddings be used for task-specific clustering and automatic segmentation. If it clips, it ships.
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.
Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.
User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.
Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.
GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.
Market Intelligence and Consumer Behaviour: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.
High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.
Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.
Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
This dataset was created by Dev Patel
Modeling online browsing and path analysis using clickstream data.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Raghu Mariswamegowda
Released under Apache 2.0
This dataset was created by Federico Garcia Blanco
Data description “e-shop clothing 2008”
Variables:
========================================================
========================================================
========================================================
========================================================
1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)
========================================================
========================================================
========================================================
========================================================
1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white
========================================================
1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right
========================================================
1-en face 2-profile
========================================================
========================================================
1-yes 2-no
========================================================
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I want to know how to solve this data regarding any problem (clustering, regression, classification, EDA)
Source: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for+online+shopping
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Simple English Wikipedia Link Graph with Clickstream Transitions is a gzipped GML file representing the hyperlink graph of the Simple English Wikipedia. It was prepared using the "pagelinks" and "page" SQL dumps for 2019-01-01 and extended with an edge property called "transitions" based on the Clickstream dump for the English Wikipedia from 2018-12. It was designed to be used as a ground truth to evaluate node ranking metrics, like PageRank, but it can be useful for Network Science in general, or for Machine Learning and Information Retrieval to compute features over a medium-sized, complete Wikipedia link graph.
Datasys Gamer Audiences dataset tracks 10M+ gaming consumers, including platform usage, time spent, and title engagement.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project contains data sets containing counts of (referer, resource) pairs extracted from the request logs of Wikipedia. A referer is an HTTP header field that identifies the address of the webpage that linked to the resource being requested. The data shows how people get to a Wikipedia article and what links they click on. In other words, it gives a weighted network of articles, where each edge weight corresponds to how often people navigate from one page to another. For more information and documentation, see the link in the references section below.
This dataset encompasses mobile web clickstream behavior on any browser, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or path to purchase and consumer journey understanding. Full URL deliverable available including searches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes 1. online store customer behavior data (clickstream) from 1.04.-30.11.2023, used to cluster customers and evaluate the effectiveness of implemented modifications (catalog: learning-dataset) 2. clustering results to verify the effectiveness of implemented changes (catalog: clustering) 3. detailed data for calculation of macro-conversion indicators (catalog: macro-conversion-indicators) 3. detailed data for calculation of micro-conversion indicators (catalog: micro-conversion-indicators)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article presents a student click-stream database comprising of 120542 train images and 80362 test images where each directory contains two sub directories i.e. "Dropouts" and "NonDropouts" as two different classes.The original dataset was provided by KDD Cup Challenge 2015 in which the dataset was provided by chinese MOOC(Massive open online course) platform XuetangX. These samples have been acquired or captured through the clickstream activity/user activity on the platform. We transformed the KDD-Cup 2015 dataset into an image dataset. This transformation will enable the application of novel deep learning and computer vision techniques to develop more sustainable, accurate, and robust predictive models for identifying students at risk of dropping out and will enable MOOC platforms to design highly robust Early Warning Systems. Furthermore, this dataset will be made publicly available to the research community to advance interdisciplinary research at the intersection of education and computer vision.
A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.
Type Data : Structured Data : DataCoSupplyChainDataset.csv Unstructured Data : tokenized_access_logs.csv (Clickstream)
Types of Products : Clothing , Sports , and Electronic Supplies
Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a multilingual ground truth dataset for training, evaluating and testing the LaSER (Language-Specific Event Recommendation) model. It contains language-specific relevance scores for event-centric click-through pairs according to the publicly available Clickstream dataset in German, French and Russian as well as the user study annotations conducted for evaluating the language-specific recommendations by LaSER. For more details, refer to EventKG+Click and LaSER.
This dataset consists of two sets of files as follows:
1. The ground truth dataset that is used for training the learning to rank (LTR) model in LaSER in three languages. The following files contain the language-specific relevance scores between a source and target entity based on EventKG+Click dataset:
In these files source and target represent the label of entities and events in the respective language.
2. The second set contains the user study participants' annotations regarding different relevance criteria of recommended events by LaSER. The following three files contain the annotations of at least three participants per event:
In these files, "r1", "r2" and "r3" denote relevance to the topic, language community and general audience respectively. And topic and event represent the wikidata-id of entities and events.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.
We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.
Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.
All features have been converted to numbers for automatic processing.
Below is the mapping used to convert categorical values to numeric:
code_module: 'AAA'=0, 'BBB'=1, 'CCC'=2, 'DDD'=3, 'EEE'=4, 'FFF'=5, 'GGG'=6
code_presentation: '2013B'=0, '2013J'=1, '2014B'=2, '2014J'=3
gender: 'F'=0, 'M'=1
highest_education: 'No_Formal_quals'=0, 'Post_Graduate_Qualification'=1, 'HE_Qualification'=2, 'Lower_Than_A_Level'=3, 'A_level_or_Equivalent'=4
IMBD_band: 'unknown'=0, 'between_0_and_10_percent'=1, 'between_10_and_20_percent'=2, 'between_20_and_30_percent'=3, 'between_30_and_40_percent'=4, 'between_40_and_50_percent'=5, 'between_50_and_60_percent'=6, 'between_60_and_70_percent'=7, 'between_70_and_80_percent'=8, 'between_80_and_90_percent'=9, 'between_90_and_100_percent'=10
age_band: 'between_0_and_35'=0, 'between_35_and_55'=1, 'higher_than_55'=2
disability: 'N'=0, 'Y'=1
student's outcome: 'Fail'=0, 'Pass'=1
For more detailed information, please refer to:
Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control
title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.
This dataset was created by Long Luu