https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview:
This dataset contains 1000 rows of synthetic online retail sales data, mimicking transactions from an e-commerce platform. It includes information about customer demographics, product details, purchase history, and (optional) reviews. This dataset is suitable for a variety of data analysis, data visualization and machine learning tasks, including but not limited to: customer segmentation, product recommendation, sales forecasting, market basket analysis, and exploring general e-commerce trends. The data was generated using the Python Faker library, ensuring realistic values and distributions, while maintaining no privacy concerns as it contains no real customer information.
Data Source:
This dataset is entirely synthetic. It was generated using the Python Faker library and does not represent any real individuals or transactions.
Data Content:
Column Name | Data Type | Description |
---|---|---|
customer_id | Integer | Unique customer identifier (ranging from 10000 to 99999) |
order_date | Date | Order date (a random date within the last year) |
product_id | Integer | Product identifier (ranging from 100 to 999) |
category_id | Integer | Product category identifier (10, 20, 30, 40, or 50) |
category_name | String | Product category name (Electronics, Fashion, Home & Living, Books & Stationery, Sports & Outdoors) |
product_name | String | Product name (randomly selected from a list of products within the corresponding category) |
quantity | Integer | Quantity of the product ordered (ranging from 1 to 5) |
price | Float | Unit price of the product (ranging from 10.00 to 500.00, with two decimal places) |
payment_method | String | Payment method used (Credit Card, Bank Transfer, Cash on Delivery) |
city | String | Customer's city (generated using Faker's city() method, so the locations will depend on the Faker locale you used) |
review_score | Integer | Customer's product rating (ranging from 1 to 5, or None with a 20% probability) |
gender | String | Customer's gender (M/F, or None with a 10% probability) |
age | Integer | Customer's age (ranging from 18 to 75) |
Potential Use Cases (Inspiration):
Customer Segmentation: Group customers based on demographics, purchasing behavior, and preferences.
Product Recommendation: Build a recommendation system to suggest products to customers based on their past purchases and browsing history.
Sales Forecasting: Predict future sales based on historical trends.
Market Basket Analysis: Identify products that are frequently purchased together.
Price Optimization: Analyze the relationship between price and demand.
Geographic Analysis: Explore sales patterns across different cities.
Time Series Analysis: Investigate sales trends over time.
Educational Purposes: Great for practicing data cleaning, EDA, feature engineering, and modeling.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains 3,400 records of fashion retail sales, capturing various details about customer purchases, including item details, purchase amounts, ratings, and payment methods. It is useful for analyzing customer buying behavior, product popularity, and payment preferences.
Column Name | Data Type | Non-Null Count | Description |
---|---|---|---|
Customer Reference ID | Integer | 3,400 | A unique identifier for each customer. |
Item Purchased | String | 3,400 | The name of the fashion item purchased. |
Purchase Amount (USD) | Float | 2,750 | The purchase price of the item in USD (650 missing values). |
Date Purchase | String | 3,400 | The date on which the purchase was made (format: DD-MM-YYYY). |
Review Rating | Float | 3,076 | The customer review rating (scale: 1 to 5, 324 missing values). |
Payment Method | String | 3,400 | The payment method used (e.g., Credit Card, Cash). |
Purchase Amount (USD)
: 650 missing values Review Rating
: 324 missing values Payment Method
includes multiple categories, allowing analysis of payment trends. Date Purchase
is in DD-MM-YYYY format, which can be useful for time-series analysis. Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/michalfr/retail-transaction-data on 13 February 2022.
--- Dataset description provided by original source is as follows ---
This dataset contains transactions and the products they contain, which were obtained by scanning receipts from retail establishments by numerous users. Products were categorized by our proprietary NLP model.
Data was collected over a one-year period and contains product information from purchases made within that period, product category inferred from product name, information about organization, transaction to which products belong to and user that uploaded receipt.
The total user count is 22. The total retail organization count is 179. The total transaction count is 805. The total product count is 7477.
@kserno
Product categorization, User Behaviour Analysis, Product Analysis, Product Price Comparison between Various Retail Stores, Prediction of Next Transaction
--- Original source retains full ownership of the source dataset ---
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides detailed insights into retail sales, featuring a range of factors that influence sales performance. It includes records on sales revenue, units sold, discount percentages, marketing spend, and the impact of seasonal trends and holidays.
This dataset is synthetic and generated for analysis purposes. It reflects typical retail sales patterns and is designed to support a wide range of data science and business analytics projects.
By UCI [source]
Comprehensive Dataset on Online Retail Sales and Customer Data
Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.
This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.
The available attributes within this dataset offer valuable pieces of information:
InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.
StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.
Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.
Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.
InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.
UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.
Finally,
- Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.
This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.
Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis
1. Sales Analysis:
Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.
2. Product Analysis:
Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.
3. Customer Segmentation:
If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.
4. Geographical Analysis:
The Country column enables analysts to study purchase patterns across different geographical locations.
Practical applications
Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Retail and Retailers Sales Time Series Collection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/census/retail-and-retailers-sales-time-series-collection on 28 January 2022.
--- Dataset description provided by original source is as follows ---
More details about each file are in the individual file descriptions.
This is a dataset from the U.S. Census Bureau hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according the amount of data that is brought in. Explore the U.S. Census Bureau using Kaggle and all of the data sources available through the U.S. Census Bureau organization page!
This dataset is maintained using FRED's API and Kaggle's API.
Cover photo by Matteo Catanese on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
--- Original source retains full ownership of the source dataset ---
BestPlace is an innovative retail data and analytics tool created explicitly for medium and enterprise-level CPG/FMCG companies. It's designed to revolutionize your retail data analysis approach by adding a strategic location-based perspective to your existing database. This perspective enriches your data landscape and allows your business to understand better and cater to shopping behavior. An In-Depth Approach to Retail Analytics Unlike conventional analytics tools, BestPlace delves deep into each store location details, providing a comprehensive analysis of your retail database. We leverage unique tools and methodologies to extract, analyze, and compile data. Our processes have been accurately designed to provide a holistic view of your business, equipping you with the information you need to make data-driven data-backed decisions. Amplifying Your Database with BestPlace At BestPlace, we understand the importance of a robust and informative retail database design. We don't just add new stores to your database; we enrich each store with vital characteristics and factors. These enhancements come from open cartographic sources such as Google Maps and our proprietary GIS database, all carefully collected and curated by our experienced data analysts. Store Features We enrich your retail database with an array of store features, which include but are not limited to: Number of reviews Average ratings Operational hours Categories relevant to each point Our attention to detail ensures your retail database becomes a powerful tool for understanding customer interactions and preferences.
Extensive Use Cases BestPlace's capabilities stretch across various applications, offering value in areas such as: Competition Analysis: Identify your competitors, analyze their performance, and understand your standing in the market with our extensive POI database and retail data analytics capabilities. New Location Search: Use our rich retail store database to identify ideal locations for store expansions based on foot traffic data, proximity to key points, and potential customer demographics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/regivm/retailtransactiondata on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The data provides customer and date level transactions for few years. It can be used for demonstration of any analysis that require transaction information like RFM. The data also provide response information of customers to a promotion campaign.
Highlight of this dataset is that you can evaluate the effectiveness RFM group by checking the one of the business metric; the response of customers.
Transaction data provides customer_id, transaction date and Amount of purchase. Response data provides the response information of each of the customers. It is a binary variable indicating whether the customer responded to a campaign or not.
Extremely thankful numerous kernel and data publishers of Kaggle and Github. Learnt a lot from these communities.
More innovative approaches for handling RFM Analysis.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is about a retail sales dataset containing information about store sales for various products over time.
The specific variables include: Store: Unique identifier for the store location Date: Calendar date of the sales data Product: Name of the product being sold Weekly Sales: Total number of units sold for the product in a week Inventory Level: Number of units of the product currently in stock at the store Temperature: Average temperature for the week at the store location Past Promotion of Product (in lac): Total value (in lakhs) of any past promotions for the product during the week (1 lac = 100,000) Demand Forecast: Predicted number of units to be sold for the product in the next week (provided for baseline model comparison)
This dataset can be used for various analytical purposes related to retail sales and inventory management, including:
Demand forecasting: By analyzing historical sales data, temperature, past promotions, and other relevant factors, you can build models to predict future demand for products. This information can be used to optimize inventory levels and prevent stock outs or overstocking. Promotion analysis: You can compare sales data during promotional periods with non-promotional periods to assess the effectiveness of different promotions and identify products that respond well to promotions. Product analysis: By analyzing sales data across different stores and time periods, you can identify which products are most popular and in which locations. This information can be used to inform product placement, marketing strategies, and assortment planning. Store performance analysis: You can compare sales performance across different stores to identify top-performing stores and understand factors contributing to their success. This information can be used to identify areas for improvement in underperforming stores.
By utilizing this dataset for these analytical purposes, retail organizations can gain valuable insights into their sales patterns, customer behavior, and inventory management practices. This information can be used to make data-driven decisions that improve sales performance, profitability, and customer satisfaction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🏦 US Retail Sales Per Capita by Store Type’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/us-retail-sales-per-capita-by-store-type-2000-20e on 13 February 2022.
--- Dataset description provided by original source is as follows ---
I have added a column on the right that shows the compound annual growth rate (CGR) of per capita spending from 2000 to 2015.
source:
This dataset was created by Gary Hoover and contains around 0 samples along with Unnamed: 15, Unnamed: 9, technical information and other features such as: - Unnamed: 18 - Unnamed: 12 - and more.
- Analyze Unnamed: 4 in relation to Unnamed: 10
- Study the influence of Unnamed: 14 on Unnamed: 1
- More datasets
If you use this dataset in your research, please credit Gary Hoover
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘USA Monthly Retail Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/landlord/usa-monthly-retail-trade on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The dataset contains the Monthly sales for retail trade and food services in USA, adjusted and unadjusted for seasonal variations for various categories. These categories shows various kind of Business categories operating in USA. These categories are based on North American Industry Classification System (NAICS).
The Dataset was published on U.S. Census Bureau website (https://www.census.gov)
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Transactional Retail Dataset of Electronics Store’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/muhammadshahrayar/transactional-retail-dataset-of-electronics-store on 14 February 2022.
--- Dataset description provided by original source is as follows ---
This dataset contains information about an online electronic store. The store has three warehouses from which goods are delivered to customers.
Use this dataset to perform graphical and/or non-graphical EDA methods to understand the data first and then find and fix the data problems. - Detect and fix errors in dirty_data.csv - Impute the missing values in missing_data.csv - Detect and remove Anolamies - To check whether a customer is happy with their last order
All the Best
--- Original source retains full ownership of the source dataset ---
You are provided with historical sales data from 2009 to 2012. This data contain 3 product category which are office supplies, technology, and furniture. Each category has several sub-categories. The company also runs promotional in the form of a discount.
There is two CSV file provided in the dataset. The raw_data.csv
is the unformatted file that has 5499 rows and 1 column. While clean_data.csv
is a formatted file that has 5499 rows and 10 columns.
Attribute Information: - order_id : unique order number - order_status : status of the order, whether is finished or returned - customer : customer name - order_date : date of the order - order_quantity : the quantity on a particular order - sales : sales generated on a particular order, the value is in IDR(Indonesia Rupiah) currency - discount : a discount percentage - discount_value : a sales multiply by discount, the value is in IDR(Indonesia Rupiah) currency - product_category : a category of the product - product_sub_category : a subcategory from product category
DQLab is an Online Data Science Learning Center to produce data practitioners who can make an impact. This dataset is part of a project in order to build analytical skills and apply knowledge to industry problems.
Project Data Analysis for Retail: Sales Performance Report: https://academy.dqlab.id/main/package/project/182?pf=0
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Explore the Meijer Grocery Store Dataset, a comprehensive collection of data on products available at Meijer, a leading American grocery store chain. This dataset includes detailed information on a wide variety of grocery items such as fresh produce, dairy, meat, beverages, household essentials, and more. Each product entry provides essential details, including product names, categories, prices, brands, descriptions, and availability, offering valuable insights for researchers, data analysts, and retail professionals.
Key Features:
Whether you're analyzing market trends in the grocery sector, researching consumer behavior, or developing new retail strategies, the Meijer Grocery Store Dataset is an invaluable resource that provides detailed insights and extensive coverage of products available at Meijer.
Success.ai’s Retail Data for Retail Professionals in APAC offers a comprehensive and accurate dataset tailored for businesses and organizations aiming to connect with key players in the retail industry across the Asia-Pacific region. Covering roles such as retail managers, merchandisers, supply chain specialists, and executives, this dataset provides verified LinkedIn profiles, work emails, and professional histories.
With access to over 700 million verified global profiles, Success.ai ensures your outreach, marketing, and collaboration strategies are powered by continuously updated, AI-validated data. Backed by our Best Price Guarantee, this solution empowers you to excel in the dynamic and competitive APAC retail market.
Why Choose Success.ai’s Retail Data?
Verified Contact Data for Precision Outreach
Comprehensive Coverage of APAC’s Retail Sector
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Comprehensive Retail Professional Profiles
Advanced Filters for Precision Campaigns
Regional and Industry-specific Insights
AI-Driven Enrichment
Strategic Use Cases:
Marketing Campaigns and Outreach
Partnership Development and Collaboration
Market Research and Competitive Analysis
Recruitment and Talent Acquisition
Why Choose Success.ai?
Best Price Guarantee
Seamless Integration
Data Accuracy with AI Validation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Retail Sales Forecasting’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tevecsystems/retail-sales-forecasting on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset contains lot of historical sales data. It was extracted from a Brazilian top retailer and has many SKUs and many stores. The data was transformed to protect the identity of the retailer.
[TBD]
This data would not be available without the full collaboration from our customers who understand that sharing their core and strategical information has more advantages than possible hazards. They also support our continuos development of innovative ML systems across their value chain.
Every retail business in the world faces a fundamental question: how much inventory should I carry? In one hand to mush inventory means working capital costs, operational costs and a complex operation. On the other hand lack of inventory leads to lost sales, unhappy customers and a damaged brand.
Current inventory management models have many solutions to place the correct order, but they are all based in a single unknown factor: the demand for the next periods.
This is why short-term forecasting is so important in retail and consumer goods industry.
We encourage you to seek for the best demand forecasting model for the next 2-3 weeks. This valuable insight can help many supply chain practitioners to correctly manage their inventory levels.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Walmart Dataset (Retail)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rutuspatel/walmart-dataset-retail on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Dataset Description :
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
Store - the store number
Date - the week of sales
Weekly_Sales - sales for the given store
Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
Temperature - Temperature on the day of sale
Fuel_Price - Cost of fuel in the region
CPI – Prevailing consumer price index
Unemployment - Prevailing unemployment rate
Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Analysis Tasks
Basic Statistics tasks
1) Which store has maximum sales
2) Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3) Which store/s has good quarterly growth rate in Q3’2012
4) Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
5) Provide a monthly and semester view of sales in units and give insights
Statistical Model
For Store 1 – Build prediction models to forecast demand
Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
Change dates into days by creating new variable.
Select the model which gives best accuracy.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 14 February 2022.
--- Dataset description provided by original source is as follows ---
Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.
While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.
A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.
Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.
Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.
Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.
You are provided with the dataset that contains store level data by brands and categories for select stores-
Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.
Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.
Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.
Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.
Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.
Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.
Know more: https://www.nielsen.com/us/en/
Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.
--- Original source retains full ownership of the source dataset ---
The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview:
This dataset contains 1000 rows of synthetic online retail sales data, mimicking transactions from an e-commerce platform. It includes information about customer demographics, product details, purchase history, and (optional) reviews. This dataset is suitable for a variety of data analysis, data visualization and machine learning tasks, including but not limited to: customer segmentation, product recommendation, sales forecasting, market basket analysis, and exploring general e-commerce trends. The data was generated using the Python Faker library, ensuring realistic values and distributions, while maintaining no privacy concerns as it contains no real customer information.
Data Source:
This dataset is entirely synthetic. It was generated using the Python Faker library and does not represent any real individuals or transactions.
Data Content:
Column Name | Data Type | Description |
---|---|---|
customer_id | Integer | Unique customer identifier (ranging from 10000 to 99999) |
order_date | Date | Order date (a random date within the last year) |
product_id | Integer | Product identifier (ranging from 100 to 999) |
category_id | Integer | Product category identifier (10, 20, 30, 40, or 50) |
category_name | String | Product category name (Electronics, Fashion, Home & Living, Books & Stationery, Sports & Outdoors) |
product_name | String | Product name (randomly selected from a list of products within the corresponding category) |
quantity | Integer | Quantity of the product ordered (ranging from 1 to 5) |
price | Float | Unit price of the product (ranging from 10.00 to 500.00, with two decimal places) |
payment_method | String | Payment method used (Credit Card, Bank Transfer, Cash on Delivery) |
city | String | Customer's city (generated using Faker's city() method, so the locations will depend on the Faker locale you used) |
review_score | Integer | Customer's product rating (ranging from 1 to 5, or None with a 20% probability) |
gender | String | Customer's gender (M/F, or None with a 10% probability) |
age | Integer | Customer's age (ranging from 18 to 75) |
Potential Use Cases (Inspiration):
Customer Segmentation: Group customers based on demographics, purchasing behavior, and preferences.
Product Recommendation: Build a recommendation system to suggest products to customers based on their past purchases and browsing history.
Sales Forecasting: Predict future sales based on historical trends.
Market Basket Analysis: Identify products that are frequently purchased together.
Price Optimization: Analyze the relationship between price and demand.
Geographic Analysis: Explore sales patterns across different cities.
Time Series Analysis: Investigate sales trends over time.
Educational Purposes: Great for practicing data cleaning, EDA, feature engineering, and modeling.