There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
MySQL Classicmodels sample database
The MySQL sample database schema consists of the following tables:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8652778%2Fefc56365be54c0e2591a1aefa5041f36%2FMySQL-Sample-Database-Schema.png?generation=1670498341027618&alt=media" alt="">
The dataset includes the information gathered and reported in the DVS Services Report as specified by Local Law 44 of 2018. The data gathered derived from two separate databases. Assistance requests for services, care, or resources supported via phone, in-person, postal mail or electronic mail. Assistance and support involve connecting City veterans and their families to a coordinated network of public, private and non-profit organizations.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 21,321 food order records from various restaurants, capturing crucial details about customer preferences, order trends, pricing, and delivery performance. It includes 6 unique imaginary restaurants, such as Swaad, Aura Pizzas, Dilli Burger Adda, Tandoori Junction, The Chicken Junction, and Masala Junction. The dataset provides a comprehensive view of food delivery operations, making it highly valuable for data analysis, predictive modeling, and machine learning applications.
Key attributes in this dataset include restaurant details (restaurant name, subzone, city), order information (order ID, timestamps, order status, delivery time, distance, number of items), pricing breakdown (bill subtotal, packaging charges, total cost, discounts), and customer feedback (ratings, reviews, order cancellations). It also tracks key delivery insights such as rider wait time, preparation duration, and distance traveled, which can be useful for logistics optimization and demand forecasting.
This dataset can be leveraged for predicting delivery times, analyzing customer behavior, identifying top-performing restaurants, and optimizing pricing strategies. It is particularly useful for food delivery platforms, restaurant managers, and data scientists looking to improve delivery efficiency and customer satisfaction. With rich historical data, this dataset can also be used for building recommendation systems, identifying peak ordering times, and enhancing user experience in food delivery applications.
The HSE Books Customer Database, held by HSE’s appointed storage and Distribution services provider holds the following information: There are 19,228 customer records (256 Active and 18,972 closed) documenting orders placed from 1998 to date (31 July 2013). The orders relate to requests for printed copies of HSE’s guidance portfolio. These records include data on: Customer Address details;Standard Industry Classification code (SIC) where applicable; Type of business; Number of Employees; Order history; Payment history; Payment type; Credit limit; This information is provided to HSE for management information purposes. Sensitive information in relation to payments or bank account details is not shared with HSE and is dealt with under the appropriate financial controls operated by our service provider and the Data Protection Act.
In the first half of 2024, Google received a total of 930 requests for disclosure of Enterprise Cloud customer information from federal agencies and governments worldwide, while the number of Enterprise Cloud customers named in the requests during the period amounted to 1,007.
This data set contains selected information for service requests received in the current year to date and the previous four calendar years, for service request types that are or were available to the public via the Customer Service Requests web portal and/or Find It, Fix It mobile app.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Manufacturers' New Orders: Consumer Goods (ACOGNO) from Feb 1992 to May 2025 about new orders, orders, new, consumer, goods, manufacturing, industry, and USA.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description
This dataset is a collection of customer, product, sales, and location data extracted from a CRM and ERP system for a retail company. It has been cleaned and transformed through various ETL (Extract, Transform, Load) processes to ensure data consistency, accuracy, and completeness. Below is a breakdown of the dataset components: 1. Customer Information (s_crm_cust_info)
This table contains information about customers, including their unique identifiers and demographic details.
Columns:
cst_id: Customer ID (Primary Key)
cst_gndr: Gender
cst_marital_status: Marital status
cst_create_date: Customer account creation date
Cleaning Steps:
Removed duplicates and handled missing or null cst_id values.
Trimmed leading and trailing spaces in cst_gndr and cst_marital_status.
Standardized gender values and identified inconsistencies in marital status.
This table contains information about products, including product identifiers, names, costs, and lifecycle dates.
Columns:
prd_id: Product ID
prd_key: Product key
prd_nm: Product name
prd_cost: Product cost
prd_start_dt: Product start date
prd_end_dt: Product end date
Cleaning Steps:
Checked for duplicates and null values in the prd_key column.
Validated product dates to ensure prd_start_dt is earlier than prd_end_dt.
Corrected product costs to remove invalid entries (e.g., negative values).
This table contains information about sales transactions, including order dates, quantities, prices, and sales amounts.
Columns:
sls_order_dt: Sales order date
sls_due_dt: Sales due date
sls_sales: Total sales amount
sls_quantity: Number of products sold
sls_price: Product unit price
Cleaning Steps:
Validated sales order dates and corrected invalid entries.
Checked for discrepancies where sls_sales did not match sls_price * sls_quantity and corrected them.
Removed null and negative values from sls_sales, sls_quantity, and sls_price.
This table contains additional customer demographic data, including gender and birthdate.
Columns:
cid: Customer ID
gen: Gender
bdate: Birthdate
Cleaning Steps:
Checked for missing or null gender values and standardized inconsistent entries.
Removed leading/trailing spaces from gen and bdate.
Validated birthdates to ensure they were within a realistic range.
This table contains country information related to the customers' locations.
Columns:
cntry: Country
Cleaning Steps:
Standardized country names (e.g., "US" and "USA" were mapped to "United States").
Removed special characters (e.g., carriage returns) and trimmed whitespace.
This table contains product category information.
Columns:
Product category data (no significant cleaning required).
Key Features:
Customer demographics, including gender and marital status
Product details such as cost, start date, and end date
Sales data with order dates, quantities, and sales amounts
ERP-specific customer and location data
Data Cleaning Process:
This dataset underwent extensive cleaning and validation, including:
Null and Duplicate Removal: Ensuring no duplicate or missing critical data (e.g., customer IDs, product keys).
Date Validations: Ensuring correct date ranges and chronological consistency.
Data Standardization: Standardizing categorical fields (e.g., gender, country names) and fixing inconsistent values.
Sales Integrity Checks: Ensuring sales amounts match the expected product of price and quantity.
This dataset is now ready for analysis and modeling, with clean, consistent, and validated data for retail analytics, customer segmentation, product analysis, and sales forecasting.
This dataset contains all work orders submitted to the city from 2021 to the present. Work orders are submitted by calling 311 or using the SeeClickFix application.Date fields: Date fields are displayed in the table with data type string. The string data type is typically used to represent text. All date information is accurate but will sort as text in the online table. Use the download feature if you would like to sort by date.More information: View other CSRS datasets on Informing Worcester. Visit the Department of Public Works & Parks webpage to learn more about their services, programs, and initiatives.Informing Worcester is the City of Worcester's open data portal where interested parties can obtain public information at no cost.
Records from operating a customer call center or service center providing services to the public. Services may address a wide variety of topics such as understanding agency mission-specific functions or how to resolve technical difficulties with external-facing systems or programs. Includes:rn- incoming requests and responsesrn- trouble tickets and tracking logs rn- recordings of call center phone conversations with customers used for quality control and customer service trainingrn- system data, including customer ticket numbers and visit tracking rn- evaluations and feedback about customer servicesrn- information about customer services, such as “Frequently Asked Questions” (FAQs) and user guidesrn- reports generated from customer management datarn- complaints and commendation records; customer feedback and satisfaction surveys, including survey instruments, data, background materials, and reports.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Mfg Ind: New Orders: Consumer Goods data was reported at 216.135 USD bn in May 2018. This records an increase from the previous number of 207.342 USD bn for Apr 2018. United States Mfg Ind: New Orders: Consumer Goods data is updated monthly, averaging 147.811 USD bn from Feb 1992 (Median) to May 2018, with 316 observations. The data reached an all-time high of 219.318 USD bn in May 2014 and a record low of 84.827 USD bn in Jul 1993. United States Mfg Ind: New Orders: Consumer Goods data remains active status in CEIC and is reported by US Census Bureau. The data is categorized under Global Database’s USA – Table US.C005: Manufacturing Industries: By NAIC System: New Orders.
The statistic represents to which extent French companies store and use their client data in 2019. The study compared data driven companies who already store their client information and use their data as a mean of transaction growth and non-data driven companies who do not yet orient themselves around client data. From the non-data driven companies, none of them tracked their users responsiveness to e-mail campaigns or other forms of advertisements and webpage visits. Of the data driven companies, 100 percent tracked their client contact information as opposed to 50 percent from the non-data driven companies. Client orders were tracked by 83 percent of the data driven companies compared to 67 percent of the non-data driven ones. The details of the purchased products played to 92 percent an important role for data driven companies who also fully tracked their website visits.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
****Attribute information:****
Row ID: A unique identifier for each row in the table Order ID: The identifier for each sales order Order Date: The date the order was placed Ship Date: The date the order was shipped Delivery Duration: The amount of time it took to deliver the order Ship Mode: The shipping method used for the order Customer ID: The identifier for the customer who placed the order Customer Name: The name of the customer who placed the order Country: The customer's country City: The customer's city State: The customer's state Postal Code: The customer's postal code Region: The customer's region Product ID: The identifier for the product that was ordered Category: The category of the product that was ordered (e.g., furniture, office supplies, technology) Sub-Category - This attribute likely refers to a subcategory within a larger product category (e.g., Tables within Furniture). (Bookcases - Chairs - Labels - Tables - Storage - Furnishings - Art - Phones - Binders - Appliances - Paper - Others). Product Name - This attribute specifies the name of the product sold. (Bush Somerset Collection Bookcase - Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back - Self-Adhesive Address Labels for Typewriters by Universal - Bretford CP4500 Series Slim Rectangular Table - Others).
Sales - This attribute shows the total sales amount for each product. Values are listed in currency format Quantity - This attribute specifies the number of units sold for each product. Integer values. Discount - This attribute indicates the discount offered on the product. Discount Value - This attribute shows the total discount amount applied to the product. Profit - This attribute shows the profit earned on the sale of each product. COGS - This attribute likely refers to each product's Cost of Goods Sold. COGS = Sales - Profit
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Welcome! This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.
This is real commercial data, it has been anonymised, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
We have also released a Marketing Funnel Dataset. You may join both datasets and see an order from Marketing perspective now!
Instructions on joining are available on this Kernel.
This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners. See more on our website: www.olist.com
After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.
https://i.imgur.com/JuJMns1.png" alt="Example of a product listing on a marketplace">
The data is divided in multiple datasets for better understanding and organization. Please refer to the following data schema when working with it:
https://i.imgur.com/HRhd2Y0.png" alt="Data Schema">
We had previously released a classified dataset, but we removed it at Version 6. We intend to release it again as a new dataset with a new data schema. While we don't finish it, you may use the classified dataset available at the Version 5 or previous.
Here are some inspiration for possible outcomes from this dataset.
NLP:
This dataset offers a supreme environment to parse out the reviews text through its multiple dimensions.
Clustering:
Some customers didn't write a review. But why are they happy or mad?
Sales Prediction:
With purchase date information you'll be able to predict future sales.
Delivery Performance:
You will also be able to work through delivery performance and find ways to optimize delivery times.
Product Quality:
Enjoy yourself discovering the products categories that are more prone to customer insatisfaction.
Feature Engineering:
Create features from this rich dataset or attach some external public information to it.
Thanks to Olist for releasing this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Chicago Fed National Activity : Sales, Orders and Inventories was -0.04000 Index in April of 2025, according to the United States Federal Reserve. Historically, United States - Chicago Fed National Activity : Sales, Orders and Inventories reached a record high of 1.37000 in June of 2020 and a record low of -2.25000 in April of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Chicago Fed National Activity : Sales, Orders and Inventories - last updated from the United States Federal Reserve on July of 2025.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Download: SQL Query This SQL project is focused on analyzing sales data from a relational database to gain insights into customer behavior, store performance, product sales, and the effectiveness of sales representatives. By executing a series of complex SQL queries across multiple tables, the project aggregates key metrics, such as total units sold and total revenue, and links them with customer, store, product, and staff details.
Key Objectives:
Customer Analysis: Understand customer purchasing patterns by analyzing the total number of units and revenue generated per customer. Product and Category Insights: Evaluate product performance and its category’s impact on overall sales. Store Performance: Identify which stores generate the most revenue and handle the highest sales volume. Sales Representative Effectiveness: Assess the performance of sales representatives by linking sales data with each representative’s handled orders. Techniques Used:
SQL Joins: The project integrates data from multiple tables, including orders, customers, order_items, products, categories, stores, and staffs, using INNER JOIN to merge information from related tables. Aggregation: SUM functions are used to compute total units sold and revenue generated by each order, providing valuable insights into sales performance. Grouping: Data is grouped by order ID, customer, product, store, and sales representative, ensuring accurate and summarized sales metrics. Use Cases:
Business Decision-Making: The analysis can help businesses identify high-performing products and stores, optimize inventory, and evaluate the impact of sales teams. Market Segmentation: Segment customers based on geographic location (city/state) and identify patterns in purchasing behavior. Sales Strategy Optimization: Provide recommendations to improve sales strategies by analyzing product categories and sales rep performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan Information Service Sales: SDP: Orders data was reported at 857,742.000 JPY mn in Sep 2018. This records an increase from the previous number of 440,240.000 JPY mn for Aug 2018. Japan Information Service Sales: SDP: Orders data is updated monthly, averaging 428,739.000 JPY mn from Feb 2007 (Median) to Sep 2018, with 140 observations. The data reached an all-time high of 1,467,866.000 JPY mn in Mar 2008 and a record low of 294,947.000 JPY mn in Apr 2007. Japan Information Service Sales: SDP: Orders data remains active status in CEIC and is reported by Ministry of Economy, Trade and Industry. The data is categorized under Global Database’s Japan – Table JP.H016: Information Services Sales.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customer Service Work Orders
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Description
This process describes the management of customer orders within a company, comprising both the registration and payment of incoming orders, as well as the process of packing and shipping these orders. For these tasks, our company deploys staff in their sales, warehousing, and shipment departments.
This is an artificial event log according to the OCEL 2.0 Standard simulated using CPN-Tools. Both the CPN and the SQLite can be downloaded. The simulation is an extension of the order management log in the former OCEL standard.
Process Overview
At our company, customers place orders (place order) for different products in varying amounts. Each product type has a price and a weight. In the current market situation, there is an inflation that irregularly leads to an increase of prices. These price rises have a negative impact on customers’ purchasing power, i.e., on order volumes.
When a customer places an order, this order is assigned to an employee of our company’s sales department. To foster customer satisfaction, our company has a single-face-to-customer policy. This means that per customer there is one primary sales representative who ought to render all services related to that customer. If that first representative is unavailable, a second sales representative should take care of the order. Should this employee be also unavailable, the order has to be managed by another employee. The tasks of sales employees comprise the registration (confirm order) as well as payment processing (payment reminder, pay order).
In parallel to this, the shipment of goods is prepared. For this, the stock of our company is checked by an employee of the warehousing department for the availability of the ordered items. If necessary, the warehouser reorders the item (item out of stock, reorder item). Items ready for shipment are collected (pick item) for the placement into packages that are addressed to single customers. Here, it may happen that a package content relates to multiple orders, and order volumes are distributed over multiple packages.
After all items allocated to a package have been picked, the package is compiled by a warehousing employee (create package). Later on, this package is picked up by a shipment employee for transport (send package). According to another policy, a warehousing employee should provide assistance to the shipment employee in loading the package. However, oftentimes shippers act contrary to that policy and load packages alone or together with a second shipment employee.
Finally, the package is shipped. Deliveries may fail repeatedly (failed delivery) until successful delivery (package delivered).
The figure below depicts the process in a simplified manner, using an informal process notation to describe the control-flow and the involved object types. A formal description is given along with the artifacts in the next section.
Further information can be found at: https://www.ocel-standard.org/event-logs/simulations/order-management/
General Properties
An overview of log properties is given below.
Property
Value
Event Types
11
Object Types
6
Events
21008
Objects
10840
Control-Flow Behavior
The behavior of the log is described by a respective object-centric Petri net. Also, individual object types exhibit behavior that can be described by simpler Petri nets. See below.
orders
customers
items
employees
packages
products
Full object-centric Petri net
Object Relationships
The company pursues the "one-face-to-the-customer" policy, in which every customer has a dedicated sales representative as well as a deputy (secondary representative). These relationships are described in the log.
Source Object Type
Target Object Type
Qualifier
employees
customers
primarySalesRep
employees
customers
secondarySalesRep
Additionally, object-to-object relations can emerge at executions of specific activities:
Activity
Source Object Type
Target Object Type
Qualifier
create package
package
employee
packed by
send package
package
employee
forwarded by
send package
package
employee
shipped by
Simulation Model
The CPN used to create this event log can also be downloaded.To obtain simulated data, extract the linked ZIP file and play out the CPN therein, e.g., by using CPN Tools.
The play-out produces CSV files according to the schema of OCEL2.0. The provided jupyter notebook can be used to convert these files to an SQLite dump.
For a technical documentation of the simulation model, please open the attached CPN with CPN Tools and see the annotations therein.
Acknowledgements
Funded under the Excellence Strategy of the Federal Government and the Länder. We also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?