100+ datasets found
  1. Sample Sales Data (5 million transactions)

    • kaggle.com
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales
    Explore at:
    zip(201186399 bytes)Available download formats
    Dataset updated
    Jul 8, 2021
    Authors
    Chris Chua
    Description

    Dataset

    This dataset was created by Chris Chua

    Contents

  2. d

    Warehouse and Retail Sales

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +4more
    Updated Nov 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly

  3. o

    Retail sales quality tables

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Retail sales quality tables [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/retailindustry/datasets/retailsalesqualitytables
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Office for National Statistics
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Standard error reference tables for the Retail Sales Index in Great Britain.

  4. c

    Sample Sales Dataset

    • cubig.ai
    zip
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Sample Sales Dataset [Dataset]. https://cubig.ai/store/products/477/sample-sales-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.

    2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.

  5. Retail Store Sales: Dirty for Data Cleaning

    • kaggle.com
    zip
    Updated Jan 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning
    Explore at:
    zip(226740 bytes)Available download formats
    Dataset updated
    Jan 18, 2025
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dirty Retail Store Sales Dataset

    Overview

    The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

    File Information

    • File Name: retail_store_sales.csv
    • Number of Rows: 12,575
    • Number of Columns: 11

    Columns Description

    Column NameDescriptionExample Values
    Transaction IDA unique identifier for each transaction. Always present and unique.TXN_1234567
    Customer IDA unique identifier for each customer. 25 unique customers.CUST_01
    CategoryThe category of the purchased item.Food, Furniture
    ItemThe name of the purchased item. May contain missing values or None.Item_1_FOOD, None
    Price Per UnitThe static price of a single unit of the item. May contain missing or None values.4.00, None
    QuantityThe quantity of the item purchased. May contain missing or None values.1, None
    Total SpentThe total amount spent on the transaction. Calculated as Quantity * Price Per Unit.8.00, None
    Payment MethodThe method of payment used. May contain missing or invalid values.Cash, Credit Card
    LocationThe location where the transaction occurred. May contain missing or invalid values.In-store, Online
    Transaction DateThe date of the transaction. Always present and valid.2023-01-15
    Discount AppliedIndicates if a discount was applied to the transaction. May contain missing values.True, False, None

    Categories and Items

    The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

    Electric Household Essentials

    Item CodeItem NamePrice
    Item_1_EHEBlender5.0
    Item_2_EHEMicrowave6.5
    Item_3_EHEToaster8.0
    Item_4_EHEVacuum Cleaner9.5
    Item_5_EHEAir Purifier11.0
    Item_6_EHEElectric Kettle12.5
    Item_7_EHERice Cooker14.0
    Item_8_EHEIron15.5
    Item_9_EHECeiling Fan17.0
    Item_10_EHETable Fan18.5
    Item_11_EHEHair Dryer20.0
    Item_12_EHEHeater21.5
    Item_13_EHEHumidifier23.0
    Item_14_EHEDehumidifier24.5
    Item_15_EHECoffee Maker26.0
    Item_16_EHEPortable AC27.5
    Item_17_EHEElectric Stove29.0
    Item_18_EHEPressure Cooker30.5
    Item_19_EHEInduction Cooktop32.0
    Item_20_EHEWater Dispenser33.5
    Item_21_EHEHand Blender35.0
    Item_22_EHEMixer Grinder36.5
    Item_23_EHESandwich Maker38.0
    Item_24_EHEAir Fryer39.5
    Item_25_EHEJuicer41.0

    Furniture

    Item CodeItem NamePrice
    Item_1_FUROffice Chair5.0
    Item_2_FURSofa6.5
    Item_3_FURCoffee Table8.0
    Item_4_FURDining Table9.5
    Item_5_FURBookshelf11.0
    Item_6_FURBed F...
  6. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  7. Z

    Dairy Supply Chain Sales Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
    Explore at:
    Dataset updated
    Jul 12, 2024
    Authors
    Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1.Introduction

    Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

    One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

    This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

    1. Citation

    Please cite the following papers when using this dataset:

    I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

    1. Dataset Modalities

    The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

    3.1 Data Collection

    The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

    The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

    Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

    It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

    The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

    File

    Period

    Number of Samples (days)

    product 1 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 1 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 1 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 2 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 2 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 2 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 3 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 3 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 3 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 4 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 4 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 4 2022.xlsx

    01/01/2022–31/12/2022

    364

    product 5 2020.xlsx

    01/01/2020–31/12/2020

    363

    product 5 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 5 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 6 2020.xlsx

    01/01/2020–31/12/2020

    362

    product 6 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 6 2022.xlsx

    01/01/2022–31/12/2022

    365

    product 7 2020.xlsx

    01/01/2020–31/12/2020

    362

    product 7 2021.xlsx

    01/01/2021–31/12/2021

    364

    product 7 2022.xlsx

    01/01/2022–31/12/2022

    365

    3.2 Dataset Overview

    The following table enumerates and explains the features included across all of the included files.

    Feature

    Description

    Unit

    Day

    day of the month

    -

    Month

    Month

    -

    Year

    Year

    -

    daily_unit_sales

    Daily sales - the amount of products, measured in units, that during that specific day were sold

    units

    previous_year_daily_unit_sales

    Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

    units

    percentage_difference_daily_unit_sales

    The percentage difference between the two above values

    %

    daily_unit_sales_kg

    The amount of products, measured in kilograms, that during that specific day were sold

    kg

    previous_year_daily_unit_sales_kg

    Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

    kg

    percentage_difference_daily_unit_sales_kg

    The percentage difference between the two above values

    kg

    daily_unit_returns_kg

    The percentage of the products that were shipped to selling points and were returned

    %

    previous_year_daily_unit_returns_kg

    The percentage of the products that were shipped to selling points and were returned the previous year

    %

    points_of_distribution

    The amount of sales representatives through which the product was sold to the market for this year

    previous_year_points_of_distribution

    The amount of sales representatives through which the product was sold to the market for the same day for the previous year

    Table 1 – Dataset Feature Description

    1. Structure and Format

    4.1 Dataset Structure

    The provided dataset has the following structure:

    Where:

    Name

    Type

    Property

    Readme.docx

    Report

    A File that contains the documentation of the Dataset.

    product X

    Folder

    A folder containing the data of a product X.

    product X YYYY.xlsx

    Data file

    An excel file containing the sales data of product X for year YYYY.

    Table 2 - Dataset File Description

    1. Acknowledgement

    This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

    References

    [1] MEVGAL is a Greek dairy production company

  8. sales dataset

    • kaggle.com
    zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VINOTH KANNA S (2025). sales dataset [Dataset]. https://www.kaggle.com/datasets/vinothkannaece/sales-dataset
    Explore at:
    zip(27634 bytes)Available download formats
    Dataset updated
    Feb 18, 2025
    Authors
    VINOTH KANNA S
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.

    Columns in the Dataset

    1. Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes.
    2. Sale_Date: The date when the sale occurred. Randomly selected from the year 2023.
    3. Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve).
    4. Region: The region where the sale took place. The possible regions are North, South, East, and West.
    5. Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units).
    6. Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50.
    7. Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food.
    8. Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units.
    9. Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost.
    10. Customer_Type: Indicates whether the customer is a New or Returning customer.
    11. Discount: The discount applied to the sale, randomly chosen between 0% and 30%.
    12. Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer).
    13. Sales_Channel: The channel through which the sale occurred. Either Online or Retail.
    14. Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.

    Disclaimer

    Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.

  9. S

    Annual Retail Store Data, 2000 [Canada] [Excel]

    • dataverse.scholarsportal.info
    • borealisdata.ca
    pdf, xls
    Updated Nov 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scholars Portal Dataverse (2021). Annual Retail Store Data, 2000 [Canada] [Excel] [Dataset]. https://dataverse.scholarsportal.info/dataset.xhtml;jsessionid=1283d69ee2dd528c9011fe4a2fe3?persistentId=hdl%3A10864%2F11351&version=&q=&fileTypeGroupFacet=&fileAccess=&fileTag=%22Tables%22&fileSortField=&fileSortOrder=
    Explore at:
    xls(2165760), xls(29696), xls(2920448), pdf(76787), pdf(158404), xls(34816), xls(2754048), pdf(81084), pdf(71183), xls(34304), xls(625664), xls(2707968), xls(695808), pdf(70673), pdf(72585), xls(576512), xls(609792), xls(28672), pdf(60236), pdf(30338), pdf(87181), pdf(84140), pdf(92012), xls(610304), pdf(74439), xls(2471424), pdf(73788), xls(30208), pdf(74478), pdf(53645)Available download formats
    Dataset updated
    Nov 17, 2021
    Dataset provided by
    Scholars Portal Dataverse
    Area covered
    Canada, Canada
    Description

    The annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.

  10. Z

    BigMart Retail Sales

    • data.niaid.nih.gov
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataman (2022). BigMart Retail Sales [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6509954
    Explore at:
    Dataset updated
    May 2, 2022
    Authors
    Dataman
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Nothing ever becomes real till it is experienced.

    -John Keats

    While we don't know the context in which John Keats mentioned this, we are sure about its implication in data science. While you would have enjoyed and gained exposure to real world problems in this challenge, here is another opportunity to get your hand dirty with this practice problem.

    Problem Statement :

    The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store.

    Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.

    Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.

    Data :

    We have 14204 samples in data set.

    Variable Description

    Item Identifier: A code provided for the item of sale

    Item Weight: Weight of item

    Item Fat Content: A categorical column of how much fat is present in the item: ‘Low Fat’, ‘Regular’, ‘low fat’, ‘LF’, ‘reg’

    Item Visibility: Numeric value for how visible the item is

    Item Type: What category does the item belong to: ‘Dairy’, ‘Soft Drinks’, ‘Meat’, ‘Fruits and Vegetables’, ‘Household’, ‘Baking Goods’, ‘Snack Foods’, ‘Frozen Foods’, ‘Breakfast’, ’Health and Hygiene’, ‘Hard Drinks’, ‘Canned’, ‘Breads’, ‘Starchy Foods’, ‘Others’, ‘Seafood’.

    Item MRP: The MRP price of item

    Outlet Identifier: Which outlet was the item sold. This will be categorical column

    Outlet Establishment Year: Which year was the outlet established

    Outlet Size: A categorical column to explain size of outlet: ‘Medium’, ‘High’, ‘Small’.

    Outlet Location Type: A categorical column to describe the location of the outlet: ‘Tier 1’, ‘Tier 2’, ‘Tier 3’

    Outlet Type: Categorical column for type of outlet: ‘Supermarket Type1’, ‘Supermarket Type2’, ‘Supermarket Type3’, ‘Grocery Store’

    Item Outlet Sales: The number of sales for an item.

    Evaluation Metric:

    We will use the Root Mean Square Error value to judge your response

  11. Electrical Product Sample Sales Data

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat Mutlu (2022). Electrical Product Sample Sales Data [Dataset]. https://www.kaggle.com/datasets/muratmutlubi/electrical-product-sample-sales-data
    Explore at:
    zip(43201106 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Murat Mutlu
    Description

    Dataset

    This dataset was created by Murat Mutlu

    Released under Data files © Original Authors

    Contents

  12. g

    Online Sales Dataset

    • gts.ai
    json
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Online Sales Dataset [Dataset]. https://gts.ai/dataset-download/online-sales-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Online Sales Dataset provides a detailed overview of global online sales transactions across various product categories. It includes transaction details such as order ID, date, product category, product name, quantity, unit price, total price, region, and payment method.

  13. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  14. Cafe Sales - Dirty Data for Cleaning Training

    • kaggle.com
    zip
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Cafe Sales - Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/cafe-sales-dirty-data-for-cleaning-training
    Explore at:
    zip(113510 bytes)Available download formats
    Dataset updated
    Jan 17, 2025
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dirty Cafe Sales Dataset

    Overview

    The Dirty Cafe Sales dataset contains 10,000 rows of synthetic data representing sales transactions in a cafe. This dataset is intentionally "dirty," with missing values, inconsistent data, and errors introduced to provide a realistic scenario for data cleaning and exploratory data analysis (EDA). It can be used to practice cleaning techniques, data wrangling, and feature engineering.

    File Information

    • File Name: dirty_cafe_sales.csv
    • Number of Rows: 10,000
    • Number of Columns: 8

    Columns Description

    Column NameDescriptionExample Values
    Transaction IDA unique identifier for each transaction. Always present and unique.TXN_1234567
    ItemThe name of the item purchased. May contain missing or invalid values (e.g., "ERROR").Coffee, Sandwich
    QuantityThe quantity of the item purchased. May contain missing or invalid values.1, 3, UNKNOWN
    Price Per UnitThe price of a single unit of the item. May contain missing or invalid values.2.00, 4.00
    Total SpentThe total amount spent on the transaction. Calculated as Quantity * Price Per Unit.8.00, 12.00
    Payment MethodThe method of payment used. May contain missing or invalid values (e.g., None, "UNKNOWN").Cash, Credit Card
    LocationThe location where the transaction occurred. May contain missing or invalid values.In-store, Takeaway
    Transaction DateThe date of the transaction. May contain missing or incorrect values.2023-01-01

    Data Characteristics

    1. Missing Values:

      • Some columns (e.g., Item, Payment Method, Location) may contain missing values represented as None or empty cells.
    2. Invalid Values:

      • Some rows contain invalid entries like "ERROR" or "UNKNOWN" to simulate real-world data issues.
    3. Price Consistency:

      • Prices for menu items are consistent but may have missing or incorrect values introduced.

    Menu Items

    The dataset includes the following menu items with their respective price ranges:

    ItemPrice($)
    Coffee2
    Tea1.5
    Sandwich4
    Salad5
    Cake3
    Cookie1
    Smoothie4
    Juice3

    Use Cases

    This dataset is suitable for: - Practicing data cleaning techniques such as handling missing values, removing duplicates, and correcting invalid entries. - Exploring EDA techniques like visualizations and summary statistics. - Performing feature engineering for machine learning workflows.

    Cleaning Steps Suggestions

    To clean this dataset, consider the following steps: 1. Handle Missing Values: - Fill missing numeric values with the median or mean. - Replace missing categorical values with the mode or "Unknown."

    1. Handle Invalid Values:

      • Replace invalid entries like "ERROR" and "UNKNOWN" with NaN or appropriate values.
    2. Date Consistency:

      • Ensure all dates are in a consistent format.
      • Fill missing dates with plausible values based on nearby records.
    3. Feature Engineering:

      • Create new columns, such as Day of the Week or Transaction Month, for further analysis.

    License

    This dataset is released under the CC BY-SA 4.0 License. You are free to use, share, and adapt it, provided you give appropriate credit.

    Feedback

    If you have any questions or feedback, feel free to reach out through the dataset's discussion board on Kaggle.

  15. c

    Power BI Sample Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Power BI Sample Dataset [Dataset]. https://cubig.ai/store/products/389/power-bi-sample-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Power BI Sample Data is a financial sample dataset provided for Power BI practice and data visualization exercises that includes a variety of financial metrics and transaction information, including sales, profits, and expenses.

    2) Data Utilization (1) Power BI Sample Data has characteristics that: • This dataset consists of numerical and categorical variables such as transaction date, region, product category, sales, profit, and cost, optimized for aggregation, analysis, and visualization. (2) Power BI Sample Data can be used to: • Revenue and Revenue Analysis: Analyze sales and profit data by region, product, and period to understand business performance and trends. • Power BI Dashboard Practice: Utilize a variety of financial metrics and transaction data to design and practice dashboards, reports, visualization charts, and more directly at Power BI.

  16. Sample Leads Dataset

    • kaggle.com
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
    Explore at:
    zip(22640 bytes)Available download formats
    Dataset updated
    Jun 24, 2022
    Authors
    ThatSean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.

  17. h

    sales-transcripts

    • huggingface.co
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gwen Shapira (2024). sales-transcripts [Dataset]. https://huggingface.co/datasets/gwenshap/sales-transcripts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2024
    Authors
    Gwen Shapira
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset was generated for use with Nile's Sales Assistant example: https://github.com/niledatabase/niledatabase/tree/main/examples/ai/sales_insight It includes:

    Simulated sales conversations for 5 different fictional companies. Chunked and embedded version of these conversations (embeddings use OpenAI's text-embedding-3-small model).

    The chunks and embeddings can be directly loaded to a vector databases and searched using vector similarity methods. The example's ./ingest directory… See the full description on the dataset page: https://huggingface.co/datasets/gwenshap/sales-transcripts.

  18. y

    US Retail Sales

    • ycharts.com
    html
    Updated Sep 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Census Bureau (2025). US Retail Sales [Dataset]. https://ycharts.com/indicators/us_retail_sales
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 16, 2025
    Dataset provided by
    YCharts
    Authors
    Census Bureau
    License

    https://www.ycharts.com/termshttps://www.ycharts.com/terms

    Time period covered
    Jan 31, 1992 - Aug 31, 2025
    Area covered
    United States
    Variables measured
    US Retail Sales
    Description

    View monthly updates and historical trends for US Retail Sales. from United States. Source: Census Bureau. Track economic data with YCharts analytics.

  19. T

    United States Existing Home Sales

    • tradingeconomics.com
    • ru.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). United States Existing Home Sales [Dataset]. https://tradingeconomics.com/united-states/existing-home-sales
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    Nov 20, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1968 - Oct 31, 2025
    Area covered
    United States
    Description

    Existing Home Sales in the United States increased to 4100 Thousand in October from 4050 Thousand in September of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  20. G

    Real manufacturing sales, orders, inventory owned and inventory to sales...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2025). Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2017 dollars, seasonally adjusted [Dataset]. https://open.canada.ca/data/dataset/7bf43dd1-af41-4c6f-871e-4c653aad27d0
    Explore at:
    csv, html, xmlAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Canadian Sales of goods manufactured (shipments), new orders, unfilled orders, inventories, raw materials, goods or work in process, finished goods, and inventory to sales ratios for durable and non-durable goods by North American Industry Classification System (NAICS) for reference periods January 2002 to the current reference month. Not all combinations are available. Values are in constant dollars.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales
Organization logo

Sample Sales Data (5 million transactions)

Explore at:
zip(201186399 bytes)Available download formats
Dataset updated
Jul 8, 2021
Authors
Chris Chua
Description

Dataset

This dataset was created by Chris Chua

Contents

Search
Clear search
Close search
Google apps
Main menu