MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.
The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.
The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.
This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.
This dataset is having data of customers who buys clothes online. The store offers in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want.
The company is trying to decide whether to focus their efforts on their mobile app experience or their website.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.
Product Details: Name, Brand, Category, and Unique ID
Pricing Information: Current Price, Discounted Price, and Currency
Availability & Ratings: Stock Status, Customer Ratings, and Reviews
Seller Information: Seller Name and Fulfillment Details
Additional Attributes: Product Description, Specifications, and Images
Format: CSV
Number of Records: 50,000+
Delivery Time: 3 Days
Price: $149.00
Availability: Immediate
This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.
The dataset "isoc_bde15dec" has been discontinued since 08/02/2024.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This notebook focuses on cleaning and exploring a raw sales dataset provided by a local fashion brand. I performed:
Data cleaning (nulls, types, duplicates)
EDA (distribution, correlation)
Visualizations using Matplotlib, Seaborn, and Plotly
This dataset was provided by a fashion retail company and contains raw sales data used for cleaning, exploration, and visualization.
File Name: Train_csv.py.csv
Number of Rows: 10,000 (approx.)
Number of Columns: 12
File Format: CSV
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘E-Commerce Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/benroshan/ecommerce-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Ever been excited to see a sales dataset ? Well, this data is perfectly curated to perform sales analysis. We have an e-commerce sales dataset from India with 3 csv files -List of Orders, Order details, Sales target
Dataset received from my University, Original Author unknown
--- Original source retains full ownership of the source dataset ---
This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 3 series, with data for years 2016 - 2017 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada); Sales (3 items: Retail trade; Electronic shopping and mail-order houses; Retail E-commerce sales).
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is a E-commerce website logs data created for helping the data analysts to practice exploratory data analysis and data visualization. The dataset has data on when the website was accessed, IP address of the source, Country, language in which website was accessed, amount of sales made by that IP address.
Included columns:
Time and duration of of accessing the website
Country, Language & Platform in which it was accessed
No. of bytes used & IP address of the person accessing website
Sales or return amount of that person
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
E-commerce sales of enterprises by NACE Rev. 2 activity
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Kindred E-commerce Merchant Deals Dataset
AI-ready catalogue of deals and offers for global retail brands.Structured in CSV and JSONL, validated against JSON Schema. Train-ready catalogue of promotions, ready for RAG, embeddings, or classic search.
Dataset Overview
File
Rows
Description
data/csv/brands.csv or data/jsonl/brands.jsonl
~90K
E-Commerce Merchant metadata, Logo URL, and domains… See the full description on the dataset page: https://huggingface.co/datasets/kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files and 12 csv files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.
This is the largest retail e-commerce orders dataset from Pakistan. It contains half a million transaction records from March 2016 to August 2018. The data was collected from various e-commerce merchants as part of a research study. I am releasing this dataset as a capstone project for my data science course at Alnafi (alnafi.com/zusmani).
There is a dire need for such dataset to learn about Pakistan’s emerging e-commerce potential and I hope this will help many startups in many ways.
Geography: Pakistan
Time period: 03/2016 – 08/2018
Unit of analysis: E-Commerce Orders
Dataset: The dataset contains detailed information of half a million e-commerce orders in Pakistan from March 2016 to August 2018. It contains item details, shipping method, payment method like credit card, Easy-Paisa, Jazz-Cash, cash-on-delivery, product categories like fashion, mobile, electronics, appliance etc., date of order, SKU, price, quantity, total and customer ID. This is the most detailed dataset about e-commerce in Pakistan that you can find in the Public domain.
Variables: The dataset contains Item ID, Order Status (Completed, Cancelled, Refund), Date of Order, SKU, Price, Quantity, Grand Total, Category, Payment Method and Customer ID.
Size: 101 MB
File Type: CSV
I like to thank all the startups who are trying to make their mark in Pakistan despite the unavailability of research data.
I’d like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:
• What is the best-selling category? • Visualize payment method and order status frequency • Find a correlation between payment method and order status • Find a correlation between order date and item category • Find any hidden patterns that are counter-intuitive for a layman • Can we predict number of orders, or item category or number of customers/amount in advance?
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Waitrose Product Dataset offers a comprehensive and structured collection of grocery items listed on the Waitrose online platform. This dataset includes 25,000+ product records across multiple categories, curated specifically for use in retail analytics, pricing comparison, AI training, and eCommerce integration.
Each record contains detailed attributes such as:
Product title, brand, MPN, and product ID
Price and currency
Availability status
Description, ingredients, and raw nutrition data
Review count and average rating
Breadcrumbs, image links, and more
Delivered in CSV format (ZIP archive), this dataset is ideal for professionals in the FMCG, retail, and grocery tech industries who need structured, crawl-ready data for their projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset that has been obtained from the Scopus database as of 7 October 2020. This dataset is gathered based on the following search query: TITLE ( "e-commerce" OR "electronic commerce" OR "e commerce" OR "ecommerce" ). It can be used to map the research on e-commerce using bibliometrics analysis from 1992 until 2020. There are 9 parts of this dataset and it has been prepared in CSV and RIS format. The data in CSV format can be opened and analysed using applications such as Microsoft Excel. It also can be opened using VOSviewer for constructing and visualizing bibliometric networks. While the data in RIS format can be opened using any reference manager software such as EndNote or Mendeley Desktop and Harzing's Publish or Perish to be further analysed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We offer a dataset comprising approximately 1,198,398 unique products sourced from Mercado Libre Perú. This dataset was collected from the platform's public API spanning from February 2022 to May 2023.
Files description:
ml_db_raw.db : Raw dataset stored in a SQLite Database
ml_db_sample.csv : A sample of only 5 electronic categories
test.csv* : 20% of data from ml_db_sample.csv
train.csv* : 80% of data from ml_db_sample.csv
Attributes description:
CatX : Category Name for X level
CatX_code : Category Code given by Mercado Libre for X level
id : Unique product identifier
title : Original product title
price : Product price
currency : Product currency (PEN, USD)
link : Product link
insert_date : Web scraping date
mlp_updated_date : Mercado Libre product update date
text : Cleaned product title
taxonomy : Category path from general to specific categories
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains all the datasets used for the performance evaluation of the MINE GRAPH RULE operator proposed in the paper "MINE GRAPH RULE: A New GQL Operator for Mining Association Rules in Property Graph Databases".
Each folder contains the following files used to create a property graph in Neo4j with a fixed schema mimicking an e-commerce site.
The folders contain various graph instances with differing dimensions, and each folder is named to reflect its defining features. The features in the name are given in this order:
(Example: the folder 10000_0.5_0.0005_0.1_0.0005_dataset contains files of a graph with 10000 nodes, of which half of them are Person nodes, 0.0005 is the probability of having a relationship FOLLOW between two Person nodes, 0.1 is the probability of having a relationship BUY between a Person node and an Item node, and 0.0005 is the probability of having a relationship RECOMMEND between a Person node and an Item node).
This dataset was created by Marwan Diab
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.
The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.
The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.
This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.