100+ datasets found
  1. Retail Transactions Dataset

    • kaggle.com
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

    Context:

    Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

    Inspiration:

    The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

    Dataset Information:

    The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

    • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
    • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
    • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
    • Product: A list of products purchased in the transaction. It includes the names of the products bought.
    • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
    • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
    • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The city where the purchase took place. It indicates the location of the transaction.
    • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
    • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
    • Customer_Category: A category representing the customer's background or age group.
    • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
    • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

    Use Cases:

    • Market Basket Analysis: Discover associations between products and uncover buying patterns.
    • Customer Segmentation: Group customers based on purchasing behavior.
    • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
    • Retail Analytics: Analyze store performance and customer trends.

    Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

  2. Global Fashion Retail Sales

    • kaggle.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ric. G. (2025). Global Fashion Retail Sales [Dataset]. https://www.kaggle.com/datasets/ricgomes/global-fashion-retail-stores-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Kaggle
    Authors
    Ric. G.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Fashion Retail Analytics Dataset

    📊 Dataset Overview

    This synthetic dataset simulates two years of transactional data for a multinational fashion retailer, featuring:
    - 📈 4+ million sales records
    - 🏪 35 stores across 7 countries:
    🇺🇸 United States | 🇨🇳 China | 🇩🇪 Germany | 🇬🇧 United Kingdom | 🇫🇷 France | 🇪🇸 Spain | 🇵🇹 Portugal

    Currencies Covered: Each transaction includes detailed currency information, covering multiple currencies:
    💵 USD (United States) | 💶 EUR (Eurozone) | 💴 CNY (China) | 💷 GBP (United Kingdom)

    Designed for Detailed and Multifaceted Analysis

    🌐 Geographic Sales Comparison
    Gain insights into how sales performance varies between regions and countries, and identify trends that drive success in different markets.

    👥 Analyze Staffing and Performance
    Evaluate store staffing ratios and analyze the impact of employee performance on store success.

    🛍️ Customer Behavior and Segmentation
    Understand regional customer preferences, analyze demographic factors such as age and occupation, and segment customers based on their purchasing habits.

    💱 Multi-Currency Analysis
    Explore how transactions in different currencies (USD, EUR, CNY, GBP) are handled, analyze currency exchange effects, and compare sales across regions using multiple currencies.

    👗 Product Trends
    Assess how product categories (e.g., Feminine, Masculine, Children) and specific product attributes (size, color) perform across different regions.

    🎯 Pricing and Discount Analysis
    Study how different pricing models and discounts affect sales and customer decisions across diverse geographies.

    📊 Advanced Cross-Country & Currency Analysis
    Conduct complex, multi-dimensional analytics that interconnect countries, currencies, and sales data, identifying hidden correlations between economic factors, regional demand, and financial performance.

    Synthetic Data Advantages

    Generated using algorithms, it simulates real-world retail dynamics while ensuring privacy.

    • Privacy-Safe: All customer and employee data is artificially generated to ensure privacy and compliance with data protection regulations. Personal details, such as emails and phone numbers, are anonymized.
    • Scalable Patterns: The data replicates real-world retail dynamics, ensuring scalability of patterns for testing algorithms and analytics models.
    • Controlled Complexity: The dataset introduces intentional complexities (e.g., missing job titles, inconsistent phone number formats) to offer a more realistic and challenging exploration experience for exploratory data analysis.
    • Customizable for Various Use Cases: Whether you're performing sales forecasting, employee performance analysis, or customer segmentation, this dataset offers a flexible foundation for diverse analytical tasks.

    This dataset is an ideal resource for retail analysts, data scientists, and business intelligence professionals aiming to explore multinational retail data, optimize operations, and uncover new insights into customer behavior, sales trends, and employee efficiency.

  3. d

    Warehouse and Retail Sales

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +4more
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly

  4. Retail sales dataset

    • kaggle.com
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharon Onyancha (2025). Retail sales dataset [Dataset]. https://www.kaggle.com/datasets/sharononyancha/retail-sales-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sharon Onyancha
    Description

    1.Data source My retail sales dataset was extracted from Kaggle.

    1. Approach The retail sales dashboard is constructed using the Excel tool. To be specific, pivot tables and charts have been used to visualise the data.

    3.Problem statement/motivation The aim of coming up with this dashboard is to give a summarised look on retail sales for future company sales predictions and provide insights

    1. Key notes/visuals Total profit by category Total profit by payment mode Count of order ID by payment mode Total quantity by subcategory
  5. A

    ‘🏦 US Retail Sales Per Capita by Store Type’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🏦 US Retail Sales Per Capita by Store Type’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-retail-sales-per-capita-by-store-type-46e1/fadb9a71/?iid=002-694&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘🏦 US Retail Sales Per Capita by Store Type’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/us-retail-sales-per-capita-by-store-type-2000-20e on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    I have added a column on the right that shows the compound annual growth rate (CGR) of per capita spending from 2000 to 2015.

    source:

    This dataset was created by Gary Hoover and contains around 0 samples along with Unnamed: 15, Unnamed: 9, technical information and other features such as: - Unnamed: 18 - Unnamed: 12 - and more.

    How to use this dataset

    • Analyze Unnamed: 4 in relation to Unnamed: 10
    • Study the influence of Unnamed: 14 on Unnamed: 1
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Gary Hoover

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  6. E-Commerce Retail Sales Series Data Collection

    • kaggle.com
    Updated Dec 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Census Bureau (2019). E-Commerce Retail Sales Series Data Collection [Dataset]. https://www.kaggle.com/datasets/census/e-commerce-retail-sales-series-data-collection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    US Census Bureau
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset from the U.S. Census Bureau hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according the amount of data that is brought in. Explore the U.S. Census Bureau using Kaggle and all of the data sources available through the U.S. Census Bureau organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using FRED's API and Kaggle's API.

  7. A

    ‘Retail and Retailers Sales Time Series Collection’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Retail and Retailers Sales Time Series Collection’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-and-retailers-sales-time-series-collection-1e34/4cb446ae/?iid=001-748&v=presentation
    Explore at:
    Dataset updated
    Sep 18, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Retail and Retailers Sales Time Series Collection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/census/retail-and-retailers-sales-time-series-collection on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset from the U.S. Census Bureau hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according the amount of data that is brought in. Explore the U.S. Census Bureau using Kaggle and all of the data sources available through the U.S. Census Bureau organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using FRED's API and Kaggle's API.

    Cover photo by Matteo Catanese on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

    --- Original source retains full ownership of the source dataset ---

  8. Grocery Inventory

    • kaggle.com
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2025). Grocery Inventory [Dataset]. http://doi.org/10.34740/kaggle/dsv/11053760
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2025
    Dataset provided by
    Kaggle
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    this graph was created in R and Canva :

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F1a47e2e6e4836b86b065441359d5c9f0%2Fgraph1.gif?generation=1742159161939732&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F87de025c5703cb69483764c4fc9c58ab%2Fgraph2.gif?generation=1742159169346925&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Fddf5001438c97c8c030333261685849b%2Fgraph3.png?generation=1742159174793142&alt=media" alt="">

    The dataset offers a comprehensive view of grocery inventory, covering 990 products across multiple categories such as Grains & Pulses, Beverages, Fruits & Vegetables, and more. It includes crucial details about each product, such as its unique identifier (Product_ID), name, category, and supplier information, including Supplier_ID and Supplier_Name. This dataset is particularly valuable for businesses aiming to optimize inventory management, sales tracking, and supply chain efficiency.

    Key inventory-related fields include Stock_Quantity, which indicates the current stock level, and Reorder_Level, which determines when a product should be reordered. The Reorder_Quantity specifies how much stock to order when inventory falls below the reorder threshold. Additionally, Unit_Price provides insight into pricing, helping businesses analyze cost trends and profitability.

    To manage product flow, the dataset includes dates such as Date_Received, which tracks when the product was added to the warehouse, and Last_Order_Date, marking the most recent procurement. For perishable goods, the Expiration_Date column is critical, allowing businesses to minimize waste by monitoring shelf life. The Warehouse_Location specifies where each product is stored, facilitating efficient inventory handling.

    Sales and performance metrics are also included. The Sales_Volume column records the total number of units sold, providing insights into consumer demand. Inventory_Turnover_Rate helps businesses assess how quickly a product sells and is replenished, ensuring better stock management. The dataset also tracks the Status of each product, indicating whether it is Active, Discontinued, or Backordered.

    The dataset serves multiple purposes in inventory management, sales performance evaluation, supplier analysis, and product lifecycle tracking. Businesses can leverage this data to refine reorder strategies, ensuring optimal stock levels and avoiding stockouts or excessive inventory. Sales analysis can help identify high-demand products and slow-moving items, enabling better decision-making in pricing and promotions. Evaluating suppliers based on their performance, pricing, and delivery efficiency helps streamline procurement and improve overall supply chain operations.

    Furthermore, the dataset can support predictive analytics by employing machine learning techniques to estimate reorder quantities, forecast demand, and optimize stock replenishment. Inventory turnover insights can aid in maintaining a balanced supply, preventing unnecessary overstocking or shortages. By tracking trends in sales, businesses can refine their marketing and distribution strategies, ensuring sustained profitability.

    This dataset is designed for educational and demonstration purposes, offering fictional data under the Creative Commons Attribution 4.0 International License. Users are free to analyze, modify, and apply the data while providing proper attribution. Additionally, certain products are marked as discontinued or backordered, reflecting real-world inventory dynamics. Businesses dealing with perishable goods should closely monitor expiration and last order dates to avoid losses due to spoilage.

    Overall, this dataset provides a versatile resource for those interested in inventory management, sales analysis, and supply chain optimization. By leveraging the structured data, businesses can make data-driven decisions to enhance operational efficiency and maximize profitability.

  9. P

    Sales Dataset

    • paperswithcode.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Sales Dataset [Dataset]. https://paperswithcode.com/dataset/sales
    Explore at:
    Dataset updated
    Oct 1, 2024
    Description

    Forecast Sales using ARIMA and SARIMA

  10. A

    ‘Superstore Sales Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Superstore Sales Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-superstore-sales-dataset-8442/a47909c8/?iid=010-519&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Superstore Sales Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rohitsahoo/sales-forecasting on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Retail dataset of a global superstore for 4 years. Perform EDA and Predict the sales of the next 7 days from the last date of the Training dataset!

    Content

    Time series analysis deals with time series based data to extract patterns for predictions and other characteristics of the data. It uses a model for forecasting future values in a small time frame based on previous observations. It is widely used for non-stationary data, such as economic data, weather data, stock prices, and retail sales forecasting.

    Dataset

    The dataset is easy to understand and is self-explanatory

    Inspiration

    Perform EDA and Predict the sales of the next 7 days from the last date of the Training dataset!

    --- Original source retains full ownership of the source dataset ---

  11. A

    ‘USA Monthly Retail Sales’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jun 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘USA Monthly Retail Sales’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-usa-monthly-retail-sales-9780/7785382b/?iid=004-633&v=presentation
    Explore at:
    Dataset updated
    Jun 19, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘USA Monthly Retail Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/landlord/usa-monthly-retail-trade on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Introduction

    The dataset contains the Monthly sales for retail trade and food services in USA, adjusted and unadjusted for seasonal variations for various categories. These categories shows various kind of Business categories operating in USA. These categories are based on North American Industry Classification System (NAICS).

    Dataset Description

    • The dataset contains the estimates of Monthly Retail and Food Services Sales by Kind of Business from the year 1992 - 2020. These estimates are shown in millions of dollars and are based on data from the Monthly Retail Trade Survey, Annual Retail Trade Survey, * Service Annual Survey, and administrative records.
    • Their are another to files that contain the monthly data for the code NAICS code 44X72: Retail Trade and Food Services: U.S. Total for both Seasonally Adjusted Sales and non Seasonally Adjusted Sales in Millions of Dollars from 1992 to 2020.
    • An helper file for NAICS code for retail and food industry is also provided for reference

    Acknowledgements

    The Dataset was published on U.S. Census Bureau website (https://www.census.gov)

    --- Original source retains full ownership of the source dataset ---

  12. t

    Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and...

    • test.researchdata.tuwien.ac.at
    bin, csv, json +1
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak (2025). Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis [Dataset]. http://doi.org/10.70124/f5t2d-xt904
    Explore at:
    csv, text/markdown, json, binAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 2025
    Description

    Context and Methodology

    Research Domain:
    The dataset is part of a project focused on retail sales forecasting. Specifically, it is designed to predict daily sales for Rossmann, a chain of over 3,000 drug stores operating across seven European countries. The project falls under the broader domain of time series analysis and machine learning applications for business optimization. The goal is to apply machine learning techniques to forecast future sales based on historical data, which includes factors like promotions, competition, holidays, and seasonal trends.

    Purpose:
    The primary purpose of this dataset is to help Rossmann store managers predict daily sales for up to six weeks in advance. By making accurate sales predictions, Rossmann can improve inventory management, staffing decisions, and promotional strategies. This dataset serves as a training set for machine learning models aimed at reducing forecasting errors and supporting decision-making processes across the company’s large network of stores.

    How the Dataset Was Created:
    The dataset was compiled from several sources, including historical sales data from Rossmann stores, promotional calendars, holiday schedules, and external factors such as competition. The data is split into multiple features, such as the store's location, promotion details, whether the store was open or closed, and weather information. The dataset is publicly available on platforms like Kaggle and was initially created for the Kaggle Rossmann Store Sales competition. The data is made accessible via an API for further analysis and modeling, and it is structured to help machine learning models predict future sales based on various input variables.

    Technical Details

    Dataset Structure:

    The dataset consists of three main files, each with its specific role:

    1. Train:
      This file contains the historical sales data, which is used to train machine learning models. It includes daily sales information for each store, as well as various features that could influence the sales (e.g., promotions, holidays, store type, etc.).

      https://handle.test.datacite.org/10.82556/yb6j-jw41
      PID: b1c59499-9c6e-42c2-af8f-840181e809db
    2. Test2:
      The test dataset mirrors the structure of train.csv but does not include the actual sales values (i.e., the target variable). This file is used for making predictions using the trained machine learning models. It is used to evaluate the accuracy of predictions when the true sales data is unknown.

      https://handle.test.datacite.org/10.82556/jerg-4b84
      PID: 7cbb845c-21dd-4b60-b990-afa8754a0dd9
    3. Store:
      This file provides metadata about each store, including information such as the store’s location, type, and assortment level. This data is essential for understanding the context in which the sales data is gathered.

      https://handle.test.datacite.org/10.82556/nqeg-gy34
      PID: 9627ec46-4ee6-4969-b14a-bda555fe34db

    Data Fields Description:

    • Id: A unique identifier for each (Store, Date) combination within the test set.

    • Store: A unique identifier for each store.

    • Sales: The daily turnover (target variable) for each store on a specific day (this is what you are predicting).

    • Customers: The number of customers visiting the store on a given day.

    • Open: An indicator of whether the store was open (1 = open, 0 = closed).

    • StateHoliday: Indicates if the day is a state holiday, with values like:

      • 'a' = public holiday,

      • 'b' = Easter holiday,

      • 'c' = Christmas,

      • '0' = no holiday.

    • SchoolHoliday: Indicates whether the store is affected by school closures (1 = yes, 0 = no).

    • StoreType: Differentiates between four types of stores: 'a', 'b', 'c', 'd'.

    • Assortment: Describes the level of product assortment in the store:

      • 'a' = basic,

      • 'b' = extra,

      • 'c' = extended.

    • CompetitionDistance: Distance (in meters) to the nearest competitor store.

    • CompetitionOpenSince[Month/Year]: The month and year when the nearest competitor store opened.

    • Promo: Indicates whether the store is running a promotion on a particular day (1 = yes, 0 = no).

    • Promo2: Indicates whether the store is participating in Promo2, a continuing promotion for some stores (1 = participating, 0 = not participating).

    • Promo2Since[Year/Week]: The year and calendar week when the store started participating in Promo2.

    • PromoInterval: Describes the months when Promo2 is active, e.g., "Feb,May,Aug,Nov" means the promotion starts in February, May, August, and November.

    Software Requirements

    To work with this dataset, you will need to have specific software installed, including:

    • DBRepo Authorization: This is required to access the datasets via the DBRepo API. You may need to authenticate with an API key or login credentials to retrieve the datasets.

    • Python Libraries: Key libraries for working with the dataset include:

      • pandas for data manipulation,

      • numpy for numerical operations,

      • matplotlib and seaborn for data visualization,

      • scikit-learn for machine learning algorithms.

    Additional Resources

    Several additional resources are available for working with the dataset:

    1. Presentation:
      A presentation summarizing the exploratory data analysis (EDA), feature engineering process, and key insights from the analysis is provided. This presentation also includes visualizations that help in understanding the dataset’s trends and relationships.

    2. Jupyter Notebook:
      A Jupyter notebook, titled Retail_Sales_Prediction_Capstone_Project.ipynb, is provided, which details the entire machine learning pipeline, from data loading and cleaning to model training and evaluation.

    3. Model Evaluation Results:
      The project includes a detailed evaluation of various machine learning models, including their performance metrics like training and testing scores, Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). This allows for a comparison of model effectiveness in forecasting sales.

    4. Trained Models (.pkl files):
      The models trained during the project are saved as .pkl files. These files contain the trained machine learning models (e.g., Random Forest, Linear Regression, etc.) that can be loaded and used to make predictions without retraining the models from scratch.

    5. sample_submission.csv:
      This file is a sample submission file that demonstrates the format of predictions expected when using the trained model. The sample_submission.csv contains predictions made on the test dataset using the trained Random Forest model. It provides an example of how the output should be structured for submission.

    These resources provide a comprehensive guide to implementing and analyzing the sales forecasting model, helping you understand the data, methods, and results in greater detail.

  13. Retail Sales Data

    • kaggle.com
    Updated Jan 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashirr (2021). Retail Sales Data [Dataset]. https://www.kaggle.com/datasets/ashkash247/retail-sales-data/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashirr
    Description

    Dataset

    This dataset was created by Ashirr

    Contents

  14. A

    ‘Retail Sales Forecasting’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Apr 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Retail Sales Forecasting’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-sales-forecasting-77b7/943748cc/?iid=002-106&v=presentation
    Explore at:
    Dataset updated
    Apr 22, 2019
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Retail Sales Forecasting’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tevecsystems/retail-sales-forecasting on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains lot of historical sales data. It was extracted from a Brazilian top retailer and has many SKUs and many stores. The data was transformed to protect the identity of the retailer.

    Content

    [TBD]

    Acknowledgements

    This data would not be available without the full collaboration from our customers who understand that sharing their core and strategical information has more advantages than possible hazards. They also support our continuos development of innovative ML systems across their value chain.

    Inspiration

    Every retail business in the world faces a fundamental question: how much inventory should I carry? In one hand to mush inventory means working capital costs, operational costs and a complex operation. On the other hand lack of inventory leads to lost sales, unhappy customers and a damaged brand.

    Current inventory management models have many solutions to place the correct order, but they are all based in a single unknown factor: the demand for the next periods.

    This is why short-term forecasting is so important in retail and consumer goods industry.

    We encourage you to seek for the best demand forecasting model for the next 2-3 weeks. This valuable insight can help many supply chain practitioners to correctly manage their inventory levels.

    --- Original source retains full ownership of the source dataset ---

  15. A

    ‘Walmart Dataset (Retail)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Apr 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Walmart Dataset (Retail)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-walmart-dataset-retail-0283/e07567d8/?iid=003-947&v=presentation
    Explore at:
    Dataset updated
    Apr 18, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Walmart Dataset (Retail)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rutuspatel/walmart-dataset-retail on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Dataset Description :

    This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:

    Store - the store number

    Date - the week of sales

    Weekly_Sales - sales for the given store

    Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week

    Temperature - Temperature on the day of sale

    Fuel_Price - Cost of fuel in the region

    CPI – Prevailing consumer price index

    Unemployment - Prevailing unemployment rate

    Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13

    Analysis Tasks

    Basic Statistics tasks

    1) Which store has maximum sales

    2) Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation

    3) Which store/s has good quarterly growth rate in Q3’2012

    4) Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together

    5) Provide a monthly and semester view of sales in units and give insights

    Statistical Model

    For Store 1 – Build prediction models to forecast demand

    Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.

    Change dates into days by creating new variable.

    Select the model which gives best accuracy.

    --- Original source retains full ownership of the source dataset ---

  16. Retail Sales Data

    • kaggle.com
    Updated Nov 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menna Essam (2023). Retail Sales Data [Dataset]. https://www.kaggle.com/anoshessam/retail-sales-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Menna Essam
    Description

    Dataset

    This dataset was created by Menna Essam

    Contents

  17. A

    ‘Sample Sales Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Sample Sales Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sample-sales-data-1dc8/1310507b/?iid=023-689&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Sample Sales Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kyanyoga/sample-sales-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Sample Sales Data, Order Info, Sales, Customer, Shipping, etc., Used for Segmentation, Customer Analytics, Clustering and More. Inspired for retail analytics. This was originally used for Pentaho DI Kettle, But I found the set could be useful for Sales Simulation training.

    Originally Written by María Carina Roldán, Pentaho Community Member, BI consultant (Assert Solutions), Argentina. This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Modified by Gus Segura June 2014.

    --- Original source retains full ownership of the source dataset ---

  18. A

    ‘Grocery Products Purchase Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Grocery Products Purchase Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-grocery-products-purchase-data-2535/4e42dd10/?iid=010-364&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Grocery Products Purchase Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/alexmiles/grocery-products-purchase-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The data-set is mainly collected by one of the retail store of Kroger in USA. This data was collected during a super-saver weekend to understand more about the customers buying behavior.

    Content

    The data mainly consist over 9000+ records which is gathered over 3 days of weekend Supersaver deal in one of the kroger retails grocery store.

    Inspiration

    This data-set may help the retail grocery stores in Up selling and Cross selling of their products.

    --- Original source retains full ownership of the source dataset ---

  19. retail-sales-p

    • kaggle.com
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijayendra D (2024). retail-sales-p [Dataset]. https://www.kaggle.com/datasets/vijayendrad/retail-sales-p/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vijayendra D
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Vijayendra D

    Released under MIT

    Contents

  20. A

    ‘Retail Case Study Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Retail Case Study Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-case-study-data-529d/30064658/?iid=008-653&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Retail Case Study Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/darpan25bajaj/retail-case-study-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Analytics in Retail:

    With the retail market getting more and more competitive by the day, there has never been anything more important than the ability for optimizing service business processes when trying to satisfy the expectations of customers. Channelizing and managing data with the aim of working in favor of the customer as well as generating profits is very significant for survival.

    Ideally, a retailer’s customer data reflects the company’s success in reaching and nurturing its customers. Retailers built reports summarizing customer behavior using metrics such as conversion rate, average order value, recency of purchase and total amount spent in recent transactions. These measurements provided general insight into the behavioral tendencies of customers.

    Customer intelligence is the practice of determining and delivering data-driven insights into past and predicted future customer behavior.To be effective, customer intelligence must combine raw transactional and behavioral data to generate derived measures. In a nutshell, for big retail players all over the world, data analytics is applied more these days at all stages of the retail process – taking track of popular products that are emerging, doing forecasts of sales and future demand via predictive simulation, optimizing placements of products and offers through heat-mapping of customers and many others.

    About the Data

    A Retail store is required to analyze the day-to-day transactions and keep a track of its customers spread across various locations along with their purchases/returns across various categories.

    What can be done with the data?

    Create a report and display the calculated metrics, reports and inferences.

    Data Schema

    This book has three sheets (Customer, Transaction, Product Hierarchy):

    • Customer: Customer information including demographics
    • Transaction: Transaction of customers
    • Product Hierarchy: Product information

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Organization logo

Retail Transactions Dataset

For market basket analysis, customer segmentation & other retail analytics tasks

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

  • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
  • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
  • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
  • Product: A list of products purchased in the transaction. It includes the names of the products bought.
  • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
  • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
  • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
  • City: The city where the purchase took place. It indicates the location of the transaction.
  • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
  • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
  • Customer_Category: A category representing the customer's background or age group.
  • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
  • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

  • Market Basket Analysis: Discover associations between products and uncover buying patterns.
  • Customer Segmentation: Group customers based on purchasing behavior.
  • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
  • Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Search
Clear search
Close search
Google apps
Main menu