22 datasets found

Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad...
data.gov.tt
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.tt (2023). Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad and Tobago Open Data Platform [Dataset]. https://data.gov.tt/dataset/retail-sales-index
Explore at:
Dataset updated
Sep 28, 2023
Dataset provided by
Data.govhttps://data.gov/
Area covered
Trinidad and Tobago
Description
The Retail Sales Index (RSI) is like a health check-up for the shopping world, done every three (3) months. Imagine visiting many different stores, from big to small, and noting how much they are selling. That is what the RSI does. It adds up the sales from these stores to get a feel for how well retail businesses are doing. This index helps us understand if people spend more or less at shops, which is a big deal for the economy. Think of it as a way to gauge our shopping habits. Plus, by comparing it with the Retail Price Index (RPI), which tracks price changes, we can see how much we are spending but how much stuff we are actually buying, considering price changes.
T
US Retail Sales
tradingeconomics.com
zh.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). US Retail Sales [Dataset]. https://tradingeconomics.com/united-states/retail-sales
Explore at:
csv, xml, excel, jsonAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 29, 1992 - May 31, 2025
Area covered
United States
Description
Retail Sales in the United States decreased 0.90 percent in May of 2025 over the previous month. This dataset provides - U.S. December Retail Sales Increased More Than Forecast - actual values, historical data, forecast, chart, statistics, economic calendar and news.
sales data
kaggle.com
Updated Aug 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronny Kimathi kaimenyi (2023). sales data [Dataset]. https://www.kaggle.com/datasets/ronnykym/online-store-sales-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ronny Kimathi kaimenyi
License
https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Description
Deluxe is an online retailer based in UK that deals in a wide range of products in the following categories: 1. Clothing 2. Games 3. Appliances 4. Electronics 5. Books 6. Beauty products 7. Smartphones 8. Outdoors products 9. Accessories 10. Other Basic household products are classified as 'Other' in the category column since they have small value to the business.

Data Description: dates: sale date order_value_EUR : sale price in EUR cost: cost of goods sold in EUR category: item category country: customers' country at the time of purchase customer_name: name of customer device_type: The gadget used by customer to access our online store(PC, mobile, tablet) sales_manager: name of the sales manager for each sale sales_representative: name of the sales rep for each sale order_id: unique identifier of an order

The data was recorded for the period 1/2/2019 and 12/30/2020 with an aim to generate business insights to guide business direction. We would like to see what interesting insights the Kaggle community members can produce from this data.
d
Retail Price Index (RPI) - Datasets - Government of the Republic of Trinidad...
data.gov.tt
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Retail Price Index (RPI) - Datasets - Government of the Republic of Trinidad and Tobago Open Data Platform [Dataset]. https://data.gov.tt/dataset/retail-price-index
Explore at:
Dataset updated
Nov 21, 2023
Description
The Retail Price Index (RPI) is a tool that helps us understand how the prices of everyday items change over time in Trinidad and Tobago. Imagine you have a shopping basket filled with various items people commonly buy, like food, gas, and other services. The RPI keeps track of how the prices of these items in the basket change each month. To do this, experts regularly check the prices of these items in fifteen (15) different areas across Trinidad and Tobago. They visit local stores, markets, and gas stations to note the current prices of food and gas, which tend to change often. For items whose prices do not change as quickly, they check the prices every three (3) months. This way, the RPI gives a clear picture of how much more or less it costs to buy the same set of items over time.
R
Person Counter Dataset
universe.roboflow.com
zip
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tkbees (2023). Person Counter Dataset [Dataset]. https://universe.roboflow.com/tkbees-ogrtd/person-counter-tq0wf
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2023
Dataset authored and provided by
Tkbees
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Bounding Boxes
Description
Here are a few use cases for this project:

Retail Analytics: Store owners can use the model to track the number of customers visiting their stores during different times of the day or seasons, which can help in workforce and resource allocation.

Crowd Management: Event organizers or public authorities can utilize the model to monitor crowd sizes at concerts, festivals, public gatherings or protests, aiding in security and emergency planning.

Smart Transportation: The model can be integrated into public transit systems to count the number of passengers in buses or trains, providing real-time occupancy information and assisting in transportation planning.

Health and Safety Compliance: During times of pandemics or emergencies, the model can be used to count the number of people in a location, ensuring compliance with restrictions on gathering sizes.

Building Security: The model can be adopted in security systems to track how many people enter and leave a building or a particular area, providing useful data for access control.
Data from: Retail Sales Index
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Jun 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Retail Sales Index [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/retailindustry/datasets/retailsalesindexreferencetables
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 20, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A series of retail sales data for Great Britain in value and volume terms, seasonally and non-seasonally adjusted.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
T
United Kingdom Retail Sales MoM
tradingeconomics.com
de.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United Kingdom Retail Sales MoM [Dataset]. https://tradingeconomics.com/united-kingdom/retail-sales
Explore at:
json, xml, excel, csvAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 29, 1996 - May 31, 2025
Area covered
United Kingdom
Description
Retail Sales in the United Kingdom decreased 2.70 percent in May of 2025 over the previous month. This dataset provides the latest reported value for - United Kingdom Retail Sales MoM - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Walmart Retail Data
kaggle.com
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saad Abdur Razzaq (2024). Walmart Retail Data [Dataset]. https://www.kaggle.com/datasets/saadabdurrazzaq/walmart-retail-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saad Abdur Razzaq
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset comprises transactional information from previous 5 years from Walmart retail stores, with diverse details such as customer demographics, order specifics, product attributes, and sales logistics. It includes data on the city where purchases were made, customer age, names, and segments, along with any applied discounts and the quantity of products ordered. Each transaction is uniquely identified by an order ID, accompanied by order date, priority, and shipping details like mode, cost, and dates. Product-related information encompasses base margins, categories, containers, names, and sub-categories, enabling insights into profitability, sales, and regional performance. The dataset also provides granular details such as profit margins, unit prices, and ZIP codes, facilitating analysis at multiple levels like customer behavior, product performance, and operational efficiencies within Walmart's retail ecosystem.

The columns in dataset are:

City: The city where the purchase was made.

Customer Age: Age of the customer making the purchase.

Customer Name: Name of the customer.

Customer Segment: Segment to which the customer belongs (like retail, wholesale, etc.).

Discount: Any discount applied to the purchase.

Number of Records: The count of records for each transaction.

Order Date: Date when the order was placed.

Order ID: Unique identifier for each order.

Order Priority: Priority level of the order (like high, medium, low).

Order Quantity: Quantity of products ordered.

Product Base Margin: Base margin percentage for the product.

Product Category: Category to which the product belongs (like electronics, groceries, etc.).

Product Container: Container type of the product.

Product Name: Name of the product.

Product Sub-Category: Sub-category to which the product belongs.

Profit: Profit earned from the transaction.

Region: Region where the purchase was made.

Row ID: Unique identifier for each row.

Sales: Total sales amount.

Ship Date: Date when the order was shipped.

Ship Mode: Mode of shipping (like standard, express, etc.).

Shipping Cost: Cost associated with shipping.

State: State where the purchase was made.

Unit Price: Price per unit of the product.

Zip Code: ZIP code of the customer or store location.
R
Terrace Fusion Dataset
universe.roboflow.com
zip
Updated Jun 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datasets connection (2022). Terrace Fusion Dataset [Dataset]. https://universe.roboflow.com/datasets-connection/terrace-fusion/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 3, 2022
Dataset authored and provided by
datasets connection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Social Distancing Bounding Boxes
Description
Here are a few use cases for this project:

"Public Safety Compliance" Use this model to monitor public spaces like parks, beaches, or shopping areas to ensure compliance with social distancing protocols. The nature of the images in the dataset could help identify instances where people are or aren't practicing safe distances and provide data on public adherence to guidelines.

"Event Management" Event organizers can integrate this model into their security system to enforce social distancing norms during festivals, concerts, games, or any other mass gathering. This will enable efficient crowd control without requiring extensive human effort.

"Retail Analytics" Retail stores could use this model to monitor customers' adherence to social distancing norms inside their stores. Understanding customer behavior with respect to these norms may provide insights for strategic decisions and operational efficiency.

"Urban Planning and Research" Researchers or urban planners can utilize this model to study the effectiveness of current social distancing policies and norms in different environments. This could help guide future policies or planning of city spaces.

"Education Sector" Schools, colleges, and universities can input live feeds or recorded footage to monitor student behavior regarding social-distancing norms. Providing real-time feedback, or periodic reports might help educational institutions in ensuring an appropriate level of safety on their campuses.
C
Data from: Retail Theft
data.cityofchicago.org
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). Retail Theft [Dataset]. https://data.cityofchicago.org/Public-Safety/Retail-Theft/skfh-cpun
Explore at:
tsv, csv, application/rssxml, xml, application/rdfxml, application/geo+json, kml, kmzAvailable download formats
Dataset updated
Jul 12, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Monthly average retail prices for selected products
www150.statcan.gc.ca
datasets.ai
+2more
Updated Jul 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Monthly average retail prices for selected products [Dataset]. http://doi.org/10.25318/1810024501-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1810024501-eng
Dataset updated
Jul 2, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Monthly average retail prices for selected products, for Canada and provinces. Prices are presented for the current month and the previous four months. Prices are based on transaction data from Canadian retailers, and are presented in Canadian current dollars.
a
sd.SD.HLTH FOODASSIST P
opendata-geospatialdenver.hub.arcgis.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
geospatialDENVER: Putting Denver on the map. (2025). sd.SD.HLTH FOODASSIST P [Dataset]. https://opendata-geospatialdenver.hub.arcgis.com/items/26c4b54a14cb434493ed6257d6cde718
Explore at:
Dataset updated
Jun 17, 2025
Dataset authored and provided by
geospatialDENVER: Putting Denver on the map.
Area covered

Description
This feature layer is maintained by Denver Department of Public Health and Environment and is a combination of data available through Hunger Free Colorado and ground truthing by the DDPHE CBH Food Systems Team. The data stewards are Paola Babb and Jessika Brenin. It displays food retail locations in Denver. The dataset displays the locations of food pantries classified as traditional of non-traditional, food banks, and food distributors.Data set comes from Hunger Free Colorado anon-profit that supports connecting people to food resources. They review food access points across the state on a yearly basis. The data set was filtered by Denver County and specifically looking at what FIC classified as a Traditional Food Pantry, Non-traditional Food Pantries, Food Bank and Food Distributor.Traditional Food Pantry: Brick and mortar location, that distributes food on a regular basis (daily, weekly, monthly). Typically distributed via in-store shopping and/or food boxesNon-traditional Food Pantries: Food assistance program occurs without a brick and mortar structure and/or on an ad hoc basis. Often targets a specific neighborhood or population and may not be open to the publicFood Bank: A place where food pantries purchase food from Food Distributor: A place that supports pantries in the delivery of food, procurement and access to local fresh produce
Clickstream Data for Online Shopping
kaggle.com
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Tunguz (2021). Clickstream Data for Online Shopping [Dataset]. https://www.kaggle.com/datasets/tunguz/clickstream-data-for-online-shopping/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bojan Tunguz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Source:

Mariusz Å apczyÅ„ski, Cracow University of Economics, Poland, lapczynm '@' uek.krakow.pl Sylwester BiaÅ‚owÄ…s, Poznan University of Economics and Business, Poland, sylwester.bialowas '@' ue.poznan.pl

Data Set Information:

The dataset contains information on clickstream from online store offering clothing for pregnant women. Data are from five months of 2008 and include, among others, product category, location of the photo on the page, country of origin of the IP address and product price in US dollars.

Attribute Information:

The dataset contains 14 variables described in a separate file (See 'Data set description')

Relevant Papers:

N/A

Citation Request:

If you use this dataset, please cite:

Å apczyÅ„ski M., BiaÅ‚owÄ…s S. (2013) Discovering Patterns of Users' Behaviour in an E-shop - Comparison of Consumer Buying Behaviours in Poland and Other European Countries, â€œStudia Ekonomiczneâ€ , nr 151, â€œLa sociÃ©tÃ© de l'information : perspective europÃ©enne et globale : les usages et les risques d'Internet pour les citoyens et les consommateursâ€ , p. 144-153

Data description ìe-shop clothing 2008î

Variables:

1. YEAR (2008)

========================================================

2. MONTH -> from April (4) to August (8)

========================================================

3. DAY -> day number of the month

========================================================

4. ORDER -> sequence of clicks during one session

========================================================

5. COUNTRY -> variable indicating the country of origin of the IP address with the

following categories:

1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

========================================================

6. SESSION ID -> variable indicating session id (short record)

========================================================

7. PAGE 1 (MAIN CATEGORY) -> concerns the main product category:

1-trousers 2-skirts 3-blouses 4-sale

========================================================

8. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product

(217 products)

========================================================

9. COLOUR -> colour of product

1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

========================================================

10. LOCATION -> photo location on the page, the screen has been divided into six parts:

1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

========================================================

11. MODEL PHOTOGRAPHY -> variable with two categories:

1-en face 2-profile

========================================================

12. PRICE -> price in US dollars

========================================================

13. PRICE 2 -> variable informing whether the price of a particular product is higher than

the average price for the entire product category

1-yes 2-no

========================================================

14. PAGE -> page number within the e-store website (from 1 to 5)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Online Retail E-Commerce Data
kaggle.com
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shravan Kanamadi (2025). Online Retail E-Commerce Data [Dataset]. https://www.kaggle.com/datasets/shravankanamadi/online-retail-e-commerce-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shravan Kanamadi
Description
Online Retail E-Commerce Data Hey everyone! 👋

This dataset contains real e-commerce transaction data from 2009 to 2011. It comes from a UK-based online store that sells a variety of products. The data includes details like invoices, product codes, descriptions, prices, and even customer IDs.

What’s Inside? Each row represents a transaction, and the dataset has the following key columns: 🛒 Invoice – Unique order ID 📦 StockCode – Product code 📝 Description – Name of the product 📊 Quantity – Number of units sold ⏳ InvoiceDate – When the purchase happened 💰 Price – Unit price of the product 👤 Customer ID – Unique identifier for each customer 🌍 Country – Where the customer is from

Why is this dataset useful? This dataset is great for exploring: Customer Segmentation (Find high-value customers) Customer Lifetime Value (LTV) Analysis Sales & Revenue Trends Market Basket Analysis (Which products are bought together?) Predicting Churn & Retention Strategies

How Can You Use It? If you're into data science, machine learning, or business analytics, this dataset is perfect for hands-on projects. You can analyze customer behavior, predict sales, or even build recommendation systems.

Hope this dataset helps with your projects! Let me know if you find something interesting.
Data from: Online retail
kaggle.com
Updated Mar 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hicham IKNE (2020). Online retail [Dataset]. https://www.kaggle.com/hikne707/online-retail/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 5, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hicham IKNE
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Data Set Information:

This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.

Attribute Information:

InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation. StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product. Description: Product (item) name. Nominal. Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated. UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£). CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer. Country: Country name. Nominal. The name of the country where a customer resides.
Online Retail List for RFM
kaggle.com
Updated Sep 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
İlker Yıldız (2021). Online Retail List for RFM [Dataset]. https://www.kaggle.com/ilkeryildiz/online-retail-listing/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2021
Dataset provided by
Kaggle
Authors
İlker Yıldız
Description
Context

This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.

Details

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments. To this end, we will define the behavior of customers and create groups according to clusters in these behaviors. In other words, we will include those who exhibit common behaviors in the same groups and we will try to develop special sales and marketing techniques for these groups.

InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.

StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.

Description: Product (item) name. Nominal.

Quantity: The quantities of each product (item) per transaction. Numeric.

InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.

UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£).

CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.

Country: Country name. Nominal. The name of the country where a customer resides.

Predicting Coupon Redemption

kaggle.com

Updated Nov 17, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

vasudeva (2019). Predicting Coupon Redemption [Dataset]. https://www.kaggle.com/vasudeva009/predicting-coupon-redemption/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 17, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

vasudeva

Description

Problem Statement

Predicting Coupon Redemption

XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.

Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.

ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.

The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -

User Demographic Details

Campaign and coupon Details

Product details

Previous transactions

Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?

Dataset Description

Here is the schema for the different data tables available. The detailed data dictionary is provided next.

You are provided with the following files:

train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns

Variable	Definition
id	Unique id for coupon customer impression
campaign_id	Unique id for a discount campaign
coupon_id	Unique id for a discount coupon
customer_id	Unique id for a customer
redemption_status	(target) (0 - Coupon not redeemed, 1 - Coupon redeemed)

campaign_data.csv: Campaign information for each of the 28 campaigns

Variable	Definition
campaign_id	Unique id for a discount campaign
campaign_type	Anonymised Campaign Type (X/Y)
start_date	Campaign Start Date
end_date	Campaign End Date

coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon

Variable	Definition
coupon_id	Unique id for a discount coupon (no order)
item_id	Unique id for items for which given coupon is valid (no order)

customer_demographics.csv: Customer demographic information for some customers

Variable	Definition
customer_id	Unique id for a customer
age_range	Age range of customer family in years
marital_status	Married/Single
rented	0 - not rented accommodation, 1 - rented accommodation
family_size	Number of family members
no_of_children	Number of children in the family
income_bracket	Label Encoded Income Bracket (Higher income corresponds to higher number)

customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data

Variable	Definition
date	Date of Transaction
customer_id	Unique id for a customer
item_id	Unique id for item
quantity	quantity of item bought
selling_price	Sales value of the transaction
other_discount	Discount from other sources such as manufacturer coupon/loyalty card
coupon_discount	Discount availed from retailer coupon

item_data.csv: Item information for each item sold by the retailer

Variable	Definition
item_id	Unique id for itemv
brand	Unique id for item brand
brand_type	Brand Type (local/Established)
category	Item Category

test.csv: Contains the coupon customer combination for which redemption status is to be predicted

Variable	Definition
id	Unique id for coupon customer impression
campaign_id	Unique id for a discount campaign
coupon_id	Unique id for a discount coupon
customer_id	Unique id for a customer

To summarise the entire process:

Customers receive coupons under various campaigns and may choose to redeem it.
They can redeem the given coupon for any valid product for that coupon as per coupon item mapping within the duration between campaign start date and end date
Next, the customer will redeem the coupon for an item at the retailer store and that will reflect in the transaction table in the column co...

Supplement Sales Prediction
kaggle.com
Updated Sep 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A SURESH (2021). Supplement Sales Prediction [Dataset]. https://www.kaggle.com/sureshmecad/supplement-sales-prediction/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 17, 2021
Dataset provided by
Kaggle
Authors
A SURESH
Description
Context

Supplement Sales Prediction

Your Client WOMart is a leading nutrition and supplement retail chain that offers a comprehensive range of products for all your wellness and fitness needs.

WOMart follows a multi-channel distribution strategy with 350+ retail stores spread across 100+ cities.

Effective forecasting for store sales gives essential insight into upcoming cash flow, meaning WOMart can more accurately plan the cashflow at the store level.

Sales data for 18 months from 365 stores of WOMart is available along with information on Store Type, Location Type for each store, Region Code for every store, Discount provided by the store on every day, Number of Orders everyday etc.

Your task is to predict the store sales for each store in the test set for the next two months.

Content

Train Data |Variable |Definition | |-------------------------------|-------------------------------| |ID |Unique Identifier for a row | |Store_id |Unique id for each Store| |Store_Type |Type of the Store| |Location_Type |Type of the location where Store is located| |Region_Code |Code of the Region where Store is located| |Date |Information about the Date| |Holiday |If there is holiday on the given Date, 1 : Yes, 0 : No| |Discount |If discount is offered by store on the given Date, Yes/ No| |#Orders |Number of Orders received by the Store on the given Day| |Sales |Total Sale for the Store on the given Day|

Test Data |Variable |Definition | |-----------------------------|-------------------------| |ID |Unique Identifier for a row | |Store_id |Unique id for each Store | |Store_Type |Type of the Store | |Location_Type |Type of the location where Store is located | |Region_Code |Code of the Region where Store is located | |Date |Information about the Date | |Holiday |If there is holiday on the given Date, 1 : Yes, 0 : No | |Discount |If discount is offered by store on the given Date, Yes/ No |

Sample_Submission |Variable |Definition | |------------------------|----------------| |ID |Unique Identifier for a row | |Sales |Total Sale for the Store on the given Day |

Evaluation

The evaluation metric for this competition is MSLE * 1000 across all entries in the test set.

Public and Private Split

Test data is further divided into Public (First 20 Days) and Private (Last 41 Days). You will make the prediction for two months (61 days).

Your initial responses will be checked and scored on the Public data.

The final rankings would be based on your private score which will be published once the competition is over.

The sales column that we submit would be compared to the actual answer similar to the following. Instead of 8 items it is 22266 items(the function is avable in sklearn).

Sample Input :

actual = [27.5, 55.9, 25.8, 17.7, 27.6, 55.9, 25.7, 17.8] predicted = 24.0, 49.1, 21.0, 16.2, 23.3, 47.0, 12.1, 15.2*1000

Sample Output:

82.9949678377161

Public and Private Split

Test data is further divided into Public (First 20 Days) and Private (Last 41 Days). You will make the prediction for two months (61 days).

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Facebook

Twitter

Click to copy link

Link copied

Cite

Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset

Retail Transactions Dataset

For market basket analysis, customer segmentation & other retail analytics tasks

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 18, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Prasad Patil

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

`Context:`

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

`Inspiration:`

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

`Dataset Information:`

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
Product: A list of products purchased in the transaction. It includes the names of the products bought.
Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
City: The city where the purchase took place. It indicates the location of the transaction.
Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
Customer_Category: A category representing the customer's background or age group.
Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

`Use Cases:`

Market Basket Analysis: Discover associations between products and uncover buying patterns.
Customer Segmentation: Group customers based on purchasing behavior.
Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Clear search

Close search

Google apps

Main menu

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Retail Sales Index (RSI) - Datasets - Government of the Republic of Trinidad...

US Retail Sales

sales data

Retail Price Index (RPI) - Datasets - Government of the Republic of Trinidad...

Person Counter Dataset

Data from: Retail Sales Index

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

United Kingdom Retail Sales MoM

Walmart Retail Data

Terrace Fusion Dataset

Data from: Retail Theft

Monthly average retail prices for selected products

sd.SD.HLTH FOODASSIST P

Clickstream Data for Online Shopping

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Data description ìe-shop clothing 2008î

Variables:

1. YEAR (2008)

2. MONTH -> from April (4) to August (8)

3. DAY -> day number of the month

4. ORDER -> sequence of clicks during one session

5. COUNTRY -> variable indicating the country of origin of the IP address with the

6. SESSION ID -> variable indicating session id (short record)

7. PAGE 1 (MAIN CATEGORY) -> concerns the main product category:

8. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product

9. COLOUR -> colour of product

10. LOCATION -> photo location on the page, the screen has been divided into six parts:

11. MODEL PHOTOGRAPHY -> variable with two categories:

12. PRICE -> price in US dollars

13. PRICE 2 -> variable informing whether the price of a particular product is higher than

14. PAGE -> page number within the e-store website (from 1 to 5)

Online Retail E-Commerce Data

Data from: Online retail

Data Set Information:

Attribute Information:

Online Retail List for RFM

Context

Details

Predicting Coupon Redemption

Problem Statement

Predicting Coupon Redemption

Dataset Description

Supplement Sales Prediction

Context

Supplement Sales Prediction

Content

Evaluation

Public and Private Split

Acknowledgements

Inspiration

Retail Transactions Dataset

For market basket analysis, customer segmentation & other retail analytics tasks

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`