30 datasets found

Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
i
Random data generated for market basket analysis
ieee-dataport.org
Updated Apr 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sule Mohammed (2022). Random data generated for market basket analysis [Dataset]. https://ieee-dataport.org/documents/random-data-generated-market-basket-analysis
Explore at:
Dataset updated
Apr 12, 2022
Authors
Sule Mohammed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset that contains 50
Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
A
‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-for-market-basket-analysis-mba-d4c7/a0d6998a/?iid=009-334&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries dataset for Market Basket Analysis(MBA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rashikrahmanpritom/groceries-dataset-for-market-basket-analysismba on 13 November 2021.

--- Dataset description provided by original source is as follows ---

The initial dataset was collected from Groceries dataset. Then data was modified and fragmented into 2 datasets for ease of MBA implementation. Here the "groceries data.csv" contains groceries transaction data from which you can do EDA and pre-process the data to feed it in the apriori algorithm. But I have also added pre-processed data as "basket.csv" from which you'll just need to replace nan and encode it using TransactionEncoder after that you can feed the encoded data into the apriori algorithm.

--- Original source retains full ownership of the source dataset ---
t
Market Basket Analysis Dataset - Dataset - LDM
service.tib.eu
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Market Basket Analysis Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/market-basket-analysis-dataset
Explore at:
Dataset updated
Jan 2, 2025
Description
The dataset contains the purchases of anonymous households in chain grocery and drug stores.
A
‘Market Basket Analysis Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Market Basket Analysis Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-market-basket-analysis-data-1c3c/915d4e47/?iid=006-454&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Market Basket Analysis Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ahmtcnbs/datasets-for-appiori on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Market Basket Analysis Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.

--- Original source retains full ownership of the source dataset ---
f
DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...
frontiersin.figshare.com
docx
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang (2023). DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based Formulas: A Nationwide Descriptive Study With Market Basket Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.641530.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.641530.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: As time evolved, traditional Chinese medicine (TCM) became integrated into the global medical system as complementary treatments. Some essential TCM herbs started to play a limited role in clinical practices because of Western medication development. For example, Fuzi (Aconiti Lateralis Radix Praeparata) is a toxic but indispensable TCM herb. Fuzi was mainly used in poor circulation and life-threatening conditions by history records. However, with various Western medication options for treating critical conditions currently, how is Fuzi used clinically and its indications in modern TCM are unclear. This study aimed to evaluate Fuzi and Fuzi-based formulas in modern clinical practices using artificial intelligence and data mining methods.Methods: This nationwide descriptive study with market basket analysis used a cohort selected from the Taiwan National Health Insurance database that contained one million national representatives between 2003 and 2010 used for our analysis. Descriptive statistics were performed to demonstrate the modern clinical indications of Fuzi. Market basket analysis was calculated by the Apriori algorithm to discover the association rules between Fuzi and other TCM herbs.Results: A total of 104,281 patients using 405,837 prescriptions of Fuzi and Fuzi-based formulas were identified. TCM doctors were found to use Fuzi in pulmonary (21.5%), gastrointestinal (17.3%), and rheumatologic (11.0%) diseases, but not commonly in cardiovascular diseases (7.4%). Long-term users of Fuzi and Fuzi-based formulas often had the following comorbidities diagnosed by Western doctors: osteoarthritis (31.0%), peptic ulcers (29.5%), hypertension (19.9%), and COPD (19.7%). Patients also used concurrent medications such as H2-receptor antagonists, nonsteroidal anti-inflammatory drugs, β-blockers, calcium channel blockers, and aspirin. Through market basket analysis, for the first time, we noticed many practical Fuzi-related herbal pairs such as Fuzi–Hsihsin (Asari Radix et Rhizoma)–Dahuang (Rhei Radix et Rhizoma) for neurologic diseases and headache.Conclusion: For the first time, big data analysis was applied to uncover the modern clinical indications of Fuzi in addition to traditional use. We provided necessary evidence on the scientific use of Fuzi in current TCM practices, and the Fuzi-related herbal pairs discovered in this study are helpful to the development of new botanical drugs.
A
‘Groceries Market Basket Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Groceries Market Basket Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-market-basket-dataset-602f/df0c0905/?iid=017-948&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries Market Basket Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/irfanasrullah/groceries on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context The Groceries Market Basket Dataset, which can be found here. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
The data is suitable to do data mining for market basket analysis which has multiple variables.

Acknowledgement Thanks to https://github.com/shubhamjha97/association-rule-mining-apriori
The data is under course Association rules mining using Apriori algorithm. Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus. Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

Pre-processing

The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was created from the unique items in the dataset to integers so that each item corresponded to a unique integer. The entire data was mapped to integers to reduce the storage and computational requirement. A reverse mapping was created from the integers to the item, so that the item names could be written in the final output file.

Don't forget to upvote before you download :)

--- Original source retains full ownership of the source dataset ---
c
ASDA groceries data
crawlfeeds.com
csv, zip
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). ASDA groceries data [Dataset]. https://crawlfeeds.com/datasets/asda-groceries-data
Explore at:
zip, csvAvailable download formats
Dataset updated
May 4, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
ASDA is england groceries supermarket chain stores and information extrated using crawl feeds in-house tools.

The data is suitable to do data mining for market basket analysis which has multiple variables.

Dataset details

Total records: 37,400

36,000+ records have brand

37,000+ records have price

36,000+ records have net content

36,000+ records have ingredients

37,000+ records have product details
Mall_customer data
kaggle.com
Updated Aug 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashlesha P D (2019). Mall_customer data [Dataset]. https://www.kaggle.com/ashleshaprix/mall-customer-data/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashlesha P D
Description
Dataset

This dataset was created by Ashlesha P D

Contents
Market Basket Analysis Data
kaggle.com
Updated Aug 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romani Banerjee (2020). Market Basket Analysis Data [Dataset]. https://www.kaggle.com/romanibanerjee/market-basket-analysis-data/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Romani Banerjee
Description
Dataset

This dataset was created by Romani Banerjee

Contents
ReInstitute Data Set
figshare.com
docx
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moinak Bhaduri (2025). ReInstitute Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.29286521.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29286521.v1
Dataset updated
Jun 10, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Moinak Bhaduri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data collected by the RE!NSTITUTE™. Each row represents one deployment of the 100-Day Challenge™. A cross indicates changes in the corresponding aspect could bebrought about in that instance of the experiment.
A
‘Groceries dataset ’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation
Explore at:
Dataset updated
Aug 15, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

--- Original source retains full ownership of the source dataset ---
Market Basket Analysis_Store_Data
kaggle.com
Updated Apr 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
N C Chetan (2021). Market Basket Analysis_Store_Data [Dataset]. https://www.kaggle.com/ncchetan/market-basket-analysis-store-data/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
N C Chetan
Description
Dataset

This dataset was created by N C Chetan

Contents
d
Replication Data for: Svalbard through the prism of Russian media
search.dataone.org
dataverse.azure.uit.no
+1more
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Obukhova, Anna (2024). Replication Data for: Svalbard through the prism of Russian media [Dataset]. http://doi.org/10.18710/UEZZUS
Explore at:
Unique identifier
https://doi.org/10.18710/UEZZUS
Dataset updated
Sep 25, 2024
Dataset provided by
DataverseNO
Authors
Obukhova, Anna
Time period covered
Jan 1, 2010 - Dec 31, 2021
Area covered
Svalbard, Russia
Description
The study applies Market Basket Analysis and Keymorph Analysis to analyze the articles related to Svalbard published in a sample of Russian mainstream federal and north-western regional media outlets produced between 2010 and 2021. The data for Market Basket Analysis is divided into six target subcorpora: Federal 2010-2013, Regional 2010-2013, Federal 2014-2017, Regional 2014-2017, Federal 2018-2021, and Regional 2018-2021. The data for Keymorph Analysis consists of six target subcorpora: Federal 2010-2013*, Regional 2010-2013*, Federal 2014-2017*, Regional 2014-2017*, Federal 2018-2021*, and Regional 2018-2021*. The data for Keymorph Analysis are the texts containing the keyword 'Spitsbergen' obtained from the data for Market Basket Analysis. Market Basket Analysis is used to retrieve Associative Arrays consisting of various keywords for the keyword meaning 'Spitsbergen'. Keymorph Analysis examines the prominence of the grammatical cases of nouns meaning 'Russia', 'Norway', and 'Spitsbergen'. The dataset includes: 1) the R code for keyword analysis (keywords serve as an input for Market Basket Analysis); 2) lists of keywords obtained from six target subcorpora Federal 2010-2013, Regional 2010-2013, Federal 2014-2017, Regional 2014-2017, Federal 2018-2021, and Regional 2018-2021; 3) the R code for Market Basket Analysis; 4) examples with the nouns meaning 'Russia', 'Norway', and 'Spitsbergen' extracted from six target subcorpora Federal 2010-2013*, Regional 2010-2013*, Federal 2014-2017*, Regional 2014-2017*, Federal 2018-2021*, and Regional 2018-2021* and annotated according to the grammatical cases of these nouns as well as the semantic meanings of the cases; 5) the calculated difference index (DIN*) values for the grammatical cases of the nouns meaning 'Russia', 'Norway', and 'Spitsbergen'. The DIN* was used in Keymorph Analysis as the effect size metric; 6) the R code for creation of the bar chart with DIN* values for the grammatical cases of the nouns meaning 'Russia', 'Norway', and 'Spitsbergen'.
o
Synthetic Retail Transactions Dataset
opendatabay.com
.undefined
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Synthetic Retail Transactions Dataset [Dataset]. https://www.opendatabay.com/data/dataset/a25d7b0f-dc8c-4c01-b0af-c90597f4a20f
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 2, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
E-commerce & Online Transactions
Description
This dataset provides simulated retail transaction data, offering valuable insights into customer purchasing behaviour and store operations. It is designed to facilitate market basket analysis, customer segmentation, and a variety of other retail analytics tasks. Each row captures detailed transaction information, including a unique identifier, the date and time of purchase, customer details, a list of purchased products, total items, total cost, payment method, and location details such as city and store type. Furthermore, it includes indicators for discounts and promotions applied, along with a customer category based on background or age group, and the season of purchase. This dataset is entirely synthetic, generated using the Python Faker library, making it a safe and versatile resource for researchers, data scientists, and analysts to develop and test algorithms, models, and analytical tools without using real customer data.

Columns

Transaction_ID: A unique 10-digit identifier for each individual transaction, ensuring each purchase can be uniquely identified.

Date: The precise date and time when each transaction occurred, providing a timestamp for every purchase.

Customer_Name: The name of the customer who completed the purchase, offering a means to identify individual customers.

Product: A detailed list of all products included in a specific transaction.

Total_Items: The total quantity of items purchased within a single transaction.

Total_Cost: The overall financial value of the transaction, denominated in currency.

Payment_Method: The chosen payment method for the transaction, such as credit card, debit card, cash, or mobile payment.

City: The geographical location (city) where the transaction took place.

Store_Type: The classification of the store where the purchase was made, e.g., supermarket, convenience store, department store.

Discount_Applied: A boolean indicator (True/False) showing whether a discount was applied to the transaction.

Customer_Category: A categorisation of the customer based on their background or age group.

Season: The season (e.g., spring, summer, autumn, winter) in which the purchase was made.

Promotion: The specific type of promotion applied to the transaction, if any (e.g., "None", "BOGO", "Discount on Selected Items").

Distribution

This dataset is typically provided in a CSV file format. It contains approximately 1 million individual transaction records. The data spans a time range from 2020-01-01 to 2024-05-19. There are 329,738 unique customer names and 571,947 unique product entries. Payment methods are distributed with 25% Cash, 25% Debit Card, and 50% Other. Transaction locations include Boston (10%), Dallas (10%), and other cities (80%). Store types are categorised as Supermarket (17%), Pharmacy (17%), and other types (67%). Discounts were applied to approximately 50% of the transactions.

Usage

This dataset is ideally suited for: * Market Basket Analysis: Uncovering associations between products and identifying common buying patterns. * Customer Segmentation: Grouping customers based on their purchasing behaviour to target specific offers. * Pricing Optimisation: Developing strategies to optimise pricing and identify opportunities for discounts and promotions. * Retail Analytics: Analysing overall store performance and emerging customer trends. * Algorithmic Development: Testing and refining machine learning models for retail forecasting or recommendation systems.

Coverage

The dataset's geographic coverage includes transactions from various cities, such as Boston and Dallas, representing a broad, though simulated, global scope. The time range of the transactions extends from 1st January 2020 to 19th May 2024. Demographic insights are provided through the Customer_Category column, which classifies customers based on background or age group, allowing for demographic-based analyses. As a synthetic dataset, specific real-world demographic notes are not applicable.

License

CC0

Who Can Use It

This dataset is beneficial for a wide range of users, including: * Researchers: For academic studies on consumer behaviour and retail economics. * Data Scientists: To develop and validate predictive models, such as recommender systems or churn prediction models. * Analysts: For performing in-depth retail analytics, market basket analysis, and customer segmentation to inform business decisions. * Students: As a practical, realistic dataset for learning and applying data analysis techniques in a retail context.

Dataset Name Suggestions

Retail Transactions Dataset

Customer Purchasing Behaviour Data

Market Basket Analysis Data

Synthetic Retail Transactions

E-commerce Transaction Log

Attributes

Original Dat
o
Retail Transaction Dataset
opendatabay.com
.undefined
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Retail Transaction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/ce827d4f-444a-4ffc-a50e-a769e596a2d3
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Food & Beverage Consumption
Description
This dataset contains 30,000 unique retail transactions, each representing a customer's shopping basket in a simulated grocery store environment. The data was generated with realistic product combinations and purchase patterns, suitable for association rule mining, recommendation systems and market basket analysis.

Each row corresponds to a single transaction, listing:

A unique transaction ID A customer ID The full list of products bought in that transaction The time of the transaction The dataset includes products across various categories such as beverages, snacks, dairy, household items, fruits, vegetables and frozen foods.

This data is entirely synthetic and does not contain any real user information.

Original Data Source: Retail Transaction Dataset
c
ocado groceries data
crawlfeeds.com
csv, zip
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). ocado groceries data [Dataset]. https://crawlfeeds.com/datasets/ocado-groceries-data
Explore at:
zip, csvAvailable download formats
Dataset updated
May 4, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Ocado is british retail business and information extrated using crawl feeds in-house tools.

The data is suitable to do data mining for market basket analysis which has multiple variables.

Dataset details

Total records: 47,000+

47, 000+ records have brand

47,000+ records have price

47,000+ records have ingredients

47,000+ records have product details
A
‘Bakery Sales Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Bakery Sales Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-bakery-sales-dataset-0101/latest
Explore at:
Dataset updated
Sep 16, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Bakery Sales Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/akashdeepkuila/bakery on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

We live in the era of e-commerce and digital marketing. We have even small scale businesses going online as the opportunities are endless. Since a huge chunk of the people who have access to internet is switching to online shopping, large retailers are actively searching for ways to increase their profit. Market Basket analysis is one such key techniques used by large retailers to to increase sales by understanding the customers' purchasing behavior & patterns. Market basket analysis examines collections of items to find relationships between items that go together within the business context.

Content

The dataset belongs to "The Bread Basket" a bakery located in Edinburgh. The dataset provide the transaction details of customers who ordered different items from this bakery online during the time period from 26-01-11 to 27-12-03. The dataset has 20507 entries, over 9000 transactions, and 4 columns.

Variables

TransactionNo : unique identifier for every single transaction

Items : items purchased

DateTime : date and time stamp of the transactions

Daypart : part of the day when a transaction is made (morning, afternoon, evening, night)

DayType : classifies whether a transaction has been made in weekend or weekdays

Inspiration

The dataset is ideal for anyone looking to practice association rule mining and understand the business context of data mining for better understanding of the buying pattern of customers.

--- Original source retains full ownership of the source dataset ---
Customer360Insights
kaggle.com
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dave Darshan (2024). Customer360Insights [Dataset]. https://www.kaggle.com/datasets/davedarshan/customer360insights
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dave Darshan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Customer360Insights

The Customer360Insights dataset is a synthetic collection meticulously designed to mirror the multifaceted nature of customer interactions within an e-commerce platform. It encompasses a wide array of variables, each serving as a pillar to support various analytical explorations. Here’s a breakdown of the dataset and the potential analyses it enables:

Dataset Description

Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.

Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.

Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.

Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.

Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.

Types of Analysis

Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.

Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.

Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.

Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.

Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.

Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.

Market Basket Analysis: Discover product affinities and develop cross-selling strategies.

This dataset is a playground for data enthusiasts to practice cleaning, transforming, visualizing, and modeling data. Whether you’re conducting A/B testing for marketing campaigns, forecasting sales, or building customer profiles, Customer360Insights offers a rich, realistic dataset for honing your data science skills.

Curious about how I created the data? Feel free to click here and take a peek! 😉

📊🔍 Good Luck and Happy Analysing 🔍📊

Facebook

Twitter

Click to copy link

Link copied

Cite

Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 9, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aslan Ahmedov

Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import
Data Understanding and Exploration
Transformation of the data – so that is ready to be consumed by the association rules algorithm
Running association rules
Exploring the rules generated
Filtering the generated rules
Visualization of Rule

Dataset Description

File name: Assignment-1_Data
List name: retaildata
File format: . xlsx
Number of Row: 522065
Number of Attributes: 7
- BillNo: 6-digit number assigned to each transaction. Nominal.
- Itemname: Product name. Nominal.
- Quantity: The quantities of each product per transaction. Numeric.
- Date: The day and time when each transaction was generated. Numeric.
- Price: Product price. Numeric.
- CustomerID: 5-digit number assigned to each customer. Nominal.
- Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
readxl - Read Excel Files in R.
plyr - Tools for Splitting, Applying and Combining Data.
ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
knitr - Dynamic Report generation in R.
magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Clear search

Close search

Google apps

Main menu

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Random data generated for market basket analysis

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2

Market Basket Analysis Dataset - Dataset - LDM

‘Market Basket Analysis Data’ analyzed by Analyst-2

DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...

‘Groceries Market Basket Dataset’ analyzed by Analyst-2

ASDA groceries data

Mall_customer data

Dataset

Contents

Market Basket Analysis Data

Dataset

Contents

ReInstitute Data Set

‘Groceries dataset ’ analyzed by Analyst-2

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms:

Market Basket Analysis_Store_Data

Dataset

Contents

Replication Data for: Svalbard through the prism of Russian media

Synthetic Retail Transactions Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Retail Transaction Dataset

ocado groceries data

‘Bakery Sales Dataset’ analyzed by Analyst-2

Context

Content

Variables

Inspiration

Customer360Insights

Customer360Insights

Dataset Description

Types of Analysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`