43 datasets found
  1. E-commerce Customer Churn

    • kaggle.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Semaya (2024). E-commerce Customer Churn [Dataset]. https://www.kaggle.com/datasets/samuelsemaya/e-commerce-customer-churn
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Kaggle
    Authors
    Samuel Semaya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    E-commerce Customer Churn Dataset

    Context

    This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.

    Content

    The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.

    Features

    1. Tenure: Tenure of a customer in the company (numeric)
    2. WarehouseToHome: Distance between the warehouse to the customer's home (numeric)
    3. NumberOfDeviceRegistered: Total number of devices registered to a particular customer (numeric)
    4. PreferedOrderCat: Preferred order category of a customer in the last month (categorical)
    5. SatisfactionScore: Satisfactory score of a customer on service (numeric)
    6. MaritalStatus: Marital status of a customer (categorical)
    7. NumberOfAddress: Total number of addresses added for a particular customer (numeric)
    8. Complaint: Whether any complaint has been raised in the last month (binary)
    9. DaySinceLastOrder: Days since last order by customer (numeric)
    10. CashbackAmount: Average cashback in last month (numeric)
    11. Churn: Churn flag (target variable, binary)

    Task

    The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.

    Potential Applications

    1. Customer Retention: Identify at-risk customers and take proactive measures to retain them.
    2. Targeted Marketing: Design specific marketing campaigns for customers likely to churn.
    3. Service Improvement: Analyze features contributing to churn and improve those aspects of the service.

    Acknowledgements

    This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.

  2. Linear Regression E-commerce Dataset

    • kaggle.com
    zip
    Updated Sep 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Kolawale (2019). Linear Regression E-commerce Dataset [Dataset]. https://www.kaggle.com/datasets/kolawale/focusing-on-mobile-app-or-website
    Explore at:
    zip(44169 bytes)Available download formats
    Dataset updated
    Sep 16, 2019
    Authors
    Saurabh Kolawale
    Description

    This dataset is having data of customers who buys clothes online. The store offers in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want.

    The company is trying to decide whether to focus their efforts on their mobile app experience or their website.

  3. Amazon India products dataset in CSV format

    • crawlfeeds.com
    csv, zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon India products dataset in CSV format [Dataset]. https://crawlfeeds.com/datasets/amazon-india-products-dataset-in-csv-format
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    India
    Description

    Gain access to a structured dataset featuring thousands of products listed on Amazon India. This dataset is ideal for e-commerce analytics, competitor research, pricing strategies, and market trend analysis.

    Dataset Features:

    • Product Details: Name, Brand, Category, and Unique ID

    • Pricing Information: Current Price, Discounted Price, and Currency

    • Availability & Ratings: Stock Status, Customer Ratings, and Reviews

    • Seller Information: Seller Name and Fulfillment Details

    • Additional Attributes: Product Description, Specifications, and Images

    Dataset Specifications:

    • Format: CSV

    • Number of Records: 50,000+

    • Delivery Time: 3 Days

    • Price: $149.00

    • Availability: Immediate

    This dataset provides structured and actionable insights to support e-commerce businesses, pricing strategies, and product optimization. If you're looking for more datasets for e-commerce analysis, explore our E-commerce datasets for a broader selection.

  4. E-commerce, customer relation management (CRM) and secure transactions by...

    • data.europa.eu
    Updated Nov 30, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2009). E-commerce, customer relation management (CRM) and secure transactions by size class of enterprise [Dataset]. https://data.europa.eu/data/datasets/i9yvadmdw9xeyctv8zeswg?locale=en
    Explore at:
    Dataset updated
    Nov 30, 2009
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    Description

    The dataset "isoc_bde15dec" has been discontinued since 08/02/2024.

  5. Data_Cleaning_EDA.ipynb

    • kaggle.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SandeepR KUMAR (2025). Data_Cleaning_EDA.ipynb [Dataset]. https://www.kaggle.com/datasets/sandeeprkumar/data-cleaning-eda-ipynb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SandeepR KUMAR
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This notebook focuses on cleaning and exploring a raw sales dataset provided by a local fashion brand. I performed:

    Data cleaning (nulls, types, duplicates)

    EDA (distribution, correlation)

    Visualizations using Matplotlib, Seaborn, and Plotly

    📁 Dataset Information

    This dataset was provided by a fashion retail company and contains raw sales data used for cleaning, exploration, and visualization.

    File Name: Train_csv.py.csv
    Number of Rows: 10,000 (approx.)
    Number of Columns: 12
    File Format: CSV

  6. A

    ‘E-Commerce Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘E-Commerce Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-e-commerce-data-6745/f0e45dc3/?iid=009-844&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘E-Commerce Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/benroshan/ecommerce-data on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Hello

    Ever been excited to see a sales dataset ? Well, this data is perfectly curated to perform sales analysis. We have an e-commerce sales dataset from India with 3 csv files -List of Orders, Order details, Sales target

    What's inside?

    1. List of Orders-This dataset contains purchase information. The information includes ID, Date of Purchase and customer details
    2. Order Details- This dataset contains order ID, with the order price, quantity,profit, category and subcategory of product
    3. Sales target-This dataset contains sales target amount and date for each product category

    Acknowledgements

    Dataset received from my University, Original Author unknown

    --- Original source retains full ownership of the source dataset ---

  7. d

    Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland (2023). Open e-commerce 1.0: Five years of crowdsourced U.S. Amazon purchase histories with user demographics [Dataset]. http://doi.org/10.7910/DVN/YGLYDY
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Alex Berke; Dan Calacci; Robert Mahari; Takahiro Yabe; Kent Larson; Sandy Pentland
    Description

    This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.

  8. G

    Retail e-commerce sales, inactive

    • open.canada.ca
    • ouvert.canada.ca
    • +2more
    csv, html, xml
    Updated Mar 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Retail e-commerce sales, inactive [Dataset]. https://open.canada.ca/data/en/dataset/0ffbe1ee-7fa7-4369-ac78-a01c8175e1a6
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This table contains 3 series, with data for years 2016 - 2017 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada); Sales (3 items: Retail trade; Electronic shopping and mail-order houses; Retail E-commerce sales).

  9. E-Commerce Website Logs

    • kaggle.com
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KZ Data Lover (2023). E-Commerce Website Logs [Dataset]. https://www.kaggle.com/datasets/kzmontage/e-commerce-website-logs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Kaggle
    Authors
    KZ Data Lover
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This is a E-commerce website logs data created for helping the data analysts to practice exploratory data analysis and data visualization. The dataset has data on when the website was accessed, IP address of the source, Country, language in which website was accessed, amount of sales made by that IP address.

    Included columns:

    Time and duration of of accessing the website
    Country, Language & Platform in which it was accessed
    No. of bytes used & IP address of the person accessing website
    Sales or return amount of that person

  10. E-commerce sales of enterprises by NACE Rev. 2 activity

    • data.europa.eu
    csv, html, tsv, xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat, E-commerce sales of enterprises by NACE Rev. 2 activity [Dataset]. https://data.europa.eu/data/datasets/welnica5mmw26o3cisijga?locale=en
    Explore at:
    xml(15400), html, tsv(2386201), xml(3485892), csv(4939952)Available download formats
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    E-commerce sales of enterprises by NACE Rev. 2 activity

  11. h

    Bitext-retail-ecommerce-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.

  12. kindred-ecommerce-merchant-deals-dataset

    • huggingface.co
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kindred Soul Ltd (2025). kindred-ecommerce-merchant-deals-dataset [Dataset]. https://huggingface.co/datasets/kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset
    Explore at:
    Dataset updated
    May 1, 2025
    Dataset authored and provided by
    Kindred Soul Ltd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Kindred E-commerce Merchant Deals Dataset

    AI-ready catalogue of deals and offers for global retail brands.Structured in CSV and JSONL, validated against JSON Schema. Train-ready catalogue of promotions, ready for RAG, embeddings, or classic search.

      Dataset Overview
    
    
    
    
      File
      Rows
      Description
    
    
        data/csv/brands.csv or data/jsonl/brands.jsonl
        ~90K
        E-Commerce Merchant metadata, Logo URL, and domains… See the full description on the dataset page: https://huggingface.co/datasets/kindred-soul-ltd/kindred-ecommerce-merchant-deals-dataset.
    
  13. m

    ShoppingAppReviews Dataset

    • data.mendeley.com
    Updated Sep 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noor Mairukh Khan Arnob (2024). ShoppingAppReviews Dataset [Dataset]. http://doi.org/10.17632/chr5b94c6y.2
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    Noor Mairukh Khan Arnob
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files and 12 csv files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.

  14. Pakistan's Largest E-Commerce Dataset

    • kaggle.com
    Updated Jan 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeeshan-ul-hassan Usmani (2021). Pakistan's Largest E-Commerce Dataset [Dataset]. https://www.kaggle.com/zusmani/pakistans-largest-ecommerce-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Zeeshan-ul-hassan Usmani
    Area covered
    Pakistan
    Description

    Context

    This is the largest retail e-commerce orders dataset from Pakistan. It contains half a million transaction records from March 2016 to August 2018. The data was collected from various e-commerce merchants as part of a research study. I am releasing this dataset as a capstone project for my data science course at Alnafi (alnafi.com/zusmani).
    There is a dire need for such dataset to learn about Pakistan’s emerging e-commerce potential and I hope this will help many startups in many ways.

    Content

    Geography: Pakistan

    Time period: 03/2016 – 08/2018

    Unit of analysis: E-Commerce Orders

    Dataset: The dataset contains detailed information of half a million e-commerce orders in Pakistan from March 2016 to August 2018. It contains item details, shipping method, payment method like credit card, Easy-Paisa, Jazz-Cash, cash-on-delivery, product categories like fashion, mobile, electronics, appliance etc., date of order, SKU, price, quantity, total and customer ID. This is the most detailed dataset about e-commerce in Pakistan that you can find in the Public domain.

    Variables: The dataset contains Item ID, Order Status (Completed, Cancelled, Refund), Date of Order, SKU, Price, Quantity, Grand Total, Category, Payment Method and Customer ID.

    Size: 101 MB

    File Type: CSV

    Acknowledgements

    I like to thank all the startups who are trying to make their mark in Pakistan despite the unavailability of research data.

    Inspiration

    I’d like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:

    • What is the best-selling category? • Visualize payment method and order status frequency • Find a correlation between payment method and order status • Find a correlation between order date and item category • Find any hidden patterns that are counter-intuitive for a layman • Can we predict number of orders, or item category or number of customers/amount in advance?

  15. Waitrose Products Information Dataset in CSV Format - Comprehensive Product...

    • crawlfeeds.com
    csv, zip
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Waitrose Products Information Dataset in CSV Format - Comprehensive Product Data [Dataset]. https://crawlfeeds.com/datasets/waitrose-products-information-dataset-in-csv-format-comprehensive-product-data
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Waitrose Product Dataset offers a comprehensive and structured collection of grocery items listed on the Waitrose online platform. This dataset includes 25,000+ product records across multiple categories, curated specifically for use in retail analytics, pricing comparison, AI training, and eCommerce integration.

    Each record contains detailed attributes such as:

    • Product title, brand, MPN, and product ID

    • Price and currency

    • Availability status

    • Description, ingredients, and raw nutrition data

    • Review count and average rating

    • Breadcrumbs, image links, and more

    Delivered in CSV format (ZIP archive), this dataset is ideal for professionals in the FMCG, retail, and grocery tech industries who need structured, crawl-ready data for their projects.

  16. Research on E-Commerce as per Scopus Database as at October 2020

    • search.datacite.org
    • data.mendeley.com
    Updated Oct 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aidi Ahmi (2020). Research on E-Commerce as per Scopus Database as at October 2020 [Dataset]. http://doi.org/10.17632/jc6mjmf29s
    Explore at:
    Dataset updated
    Oct 7, 2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Mendeley
    Authors
    Aidi Ahmi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset that has been obtained from the Scopus database as of 7 October 2020. This dataset is gathered based on the following search query: TITLE ( "e-commerce" OR "electronic commerce" OR "e commerce" OR "ecommerce" ). It can be used to map the research on e-commerce using bibliometrics analysis from 1992 until 2020. There are 9 parts of this dataset and it has been prepared in CSV and RIS format. The data in CSV format can be opened and analysed using applications such as Microsoft Excel. It also can be opened using VOSviewer for constructing and visualizing bibliometric networks. While the data in RIS format can be opened using any reference manager software such as EndNote or Mendeley Desktop and Harzing's Publish or Perish to be further analysed.

  17. Z

    E-commerce Product Dataset from Mercado Libre Perú

    • data.niaid.nih.gov
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cotacallapa Mamani, Harold Enrique (2023). E-commerce Product Dataset from Mercado Libre Perú [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8415495
    Explore at:
    Dataset updated
    Oct 12, 2023
    Dataset authored and provided by
    Cotacallapa Mamani, Harold Enrique
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We offer a dataset comprising approximately 1,198,398 unique products sourced from Mercado Libre Perú. This dataset was collected from the platform's public API spanning from February 2022 to May 2023.

    Files description:

    ml_db_raw.db : Raw dataset stored in a SQLite Database

    ml_db_sample.csv : A sample of only 5 electronic categories

    test.csv* : 20% of data from ml_db_sample.csv

    train.csv* : 80% of data from ml_db_sample.csv

    • The dataset was divided into training and testing sets using a random stratified technique.

    Attributes description:

    CatX : Category Name for X level

    CatX_code : Category Code given by Mercado Libre for X level

    id : Unique product identifier

    title : Original product title

    price : Product price

    currency : Product currency (PEN, USD)

    link : Product link

    insert_date : Web scraping date

    mlp_updated_date : Mercado Libre product update date

    text : Cleaned product title

    taxonomy : Category path from general to specific categories

  18. h

    Bitext-customer-support-llm-chatbot-training-dataset

    • huggingface.co
    • opendatalab.com
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

  19. Scaling Ecommerce Graphs

    • zenodo.org
    zip
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Cambria; Francesco Cambria (2025). Scaling Ecommerce Graphs [Dataset]. http://doi.org/10.5281/zenodo.14728774
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francesco Cambria; Francesco Cambria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains all the datasets used for the performance evaluation of the MINE GRAPH RULE operator proposed in the paper "MINE GRAPH RULE: A New GQL Operator for Mining Association Rules in Property Graph Databases".

    Each folder contains the following files used to create a property graph in Neo4j with a fixed schema mimicking an e-commerce site.

    • Item.csv - contains the data for the Item nodes.
    • Person.csv - contains the data for the Person nodes.
    • Category.csv - contains the data for the Category nodes.
    • FOLLOW.csv - contains the data for the FOLLOW relationships from Person to Person nodes.
    • BUY.csv - contains the data for the BUY relationships from Person to Item nodes.
    • RECOMMEND.csv - contains the data for the RECOMMEND relationship from Person to Item nodes.
    • OF.csv - contains the data for the OF relationship from Item to Category nodes.

    The folders contain various graph instances with differing dimensions, and each folder is named to reflect its defining features. The features in the name are given in this order:

    • Total number of nodes within the graph.
    • Ratio of the number of Person nodes over the nodes with other labels.
    • Probability of having a relationship FOLLOW between two Person nodes.
    • Probability of having a relationship BUY between a Person node and an Item node.
    • Probability of having a relationship RECOMMEND between a Person node and an Item node.

    (Example: the folder 10000_0.5_0.0005_0.1_0.0005_dataset contains files of a graph with 10000 nodes, of which half of them are Person nodes, 0.0005 is the probability of having a relationship FOLLOW between two Person nodes, 0.1 is the probability of having a relationship BUY between a Person node and an Item node, and 0.0005 is the probability of having a relationship RECOMMEND between a Person node and an Item node).

  20. E-Commerce Website Analyze A/B Test Dataset

    • kaggle.com
    Updated Jul 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marwan Diab (2022). E-Commerce Website Analyze A/B Test Dataset [Dataset]. https://www.kaggle.com/datasets/marwandiab/ecommerce-website-analyze-ab-test-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marwan Diab
    Description

    Dataset

    This dataset was created by Marwan Diab

    Contents

    Data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samuel Semaya (2024). E-commerce Customer Churn [Dataset]. https://www.kaggle.com/datasets/samuelsemaya/e-commerce-customer-churn
Organization logo

E-commerce Customer Churn

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset provided by
Kaggle
Authors
Samuel Semaya
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

E-commerce Customer Churn Dataset

Context

This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.

Content

The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.

Features

  1. Tenure: Tenure of a customer in the company (numeric)
  2. WarehouseToHome: Distance between the warehouse to the customer's home (numeric)
  3. NumberOfDeviceRegistered: Total number of devices registered to a particular customer (numeric)
  4. PreferedOrderCat: Preferred order category of a customer in the last month (categorical)
  5. SatisfactionScore: Satisfactory score of a customer on service (numeric)
  6. MaritalStatus: Marital status of a customer (categorical)
  7. NumberOfAddress: Total number of addresses added for a particular customer (numeric)
  8. Complaint: Whether any complaint has been raised in the last month (binary)
  9. DaySinceLastOrder: Days since last order by customer (numeric)
  10. CashbackAmount: Average cashback in last month (numeric)
  11. Churn: Churn flag (target variable, binary)

Task

The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.

Potential Applications

  1. Customer Retention: Identify at-risk customers and take proactive measures to retain them.
  2. Targeted Marketing: Design specific marketing campaigns for customers likely to churn.
  3. Service Improvement: Analyze features contributing to churn and improve those aspects of the service.

Acknowledgements

This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.

Search
Clear search
Close search
Google apps
Main menu