30 datasets found
  1. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. i

    Random data generated for market basket analysis

    • ieee-dataport.org
    Updated Apr 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sule Mohammed (2022). Random data generated for market basket analysis [Dataset]. https://ieee-dataport.org/documents/random-data-generated-market-basket-analysis
    Explore at:
    Dataset updated
    Apr 12, 2022
    Authors
    Sule Mohammed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset that contains 50

  3. Retail Transactions Dataset

    • kaggle.com
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

    Context:

    Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

    Inspiration:

    The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

    Dataset Information:

    The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

    • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
    • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
    • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
    • Product: A list of products purchased in the transaction. It includes the names of the products bought.
    • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
    • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
    • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The city where the purchase took place. It indicates the location of the transaction.
    • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
    • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
    • Customer_Category: A category representing the customer's background or age group.
    • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
    • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

    Use Cases:

    • Market Basket Analysis: Discover associations between products and uncover buying patterns.
    • Customer Segmentation: Group customers based on purchasing behavior.
    • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
    • Retail Analytics: Analyze store performance and customer trends.

    Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

  4. A

    ‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-for-market-basket-analysis-mba-d4c7/a0d6998a/?iid=009-334&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Groceries dataset for Market Basket Analysis(MBA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rashikrahmanpritom/groceries-dataset-for-market-basket-analysismba on 13 November 2021.

    --- Dataset description provided by original source is as follows ---

    The initial dataset was collected from Groceries dataset. Then data was modified and fragmented into 2 datasets for ease of MBA implementation. Here the "groceries data.csv" contains groceries transaction data from which you can do EDA and pre-process the data to feed it in the apriori algorithm. But I have also added pre-processed data as "basket.csv" from which you'll just need to replace nan and encode it using TransactionEncoder after that you can feed the encoded data into the apriori algorithm.

    --- Original source retains full ownership of the source dataset ---

  5. t

    Market Basket Analysis Dataset - Dataset - LDM

    • service.tib.eu
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Market Basket Analysis Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/market-basket-analysis-dataset
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    The dataset contains the purchases of anonymous households in chain grocery and drug stores.

  6. A

    ‘Market Basket Analysis Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Market Basket Analysis Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-market-basket-analysis-data-1c3c/915d4e47/?iid=006-454&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Market Basket Analysis Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ahmtcnbs/datasets-for-appiori on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Market Basket Analysis Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.

    --- Original source retains full ownership of the source dataset ---

  7. f

    DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...

    • frontiersin.figshare.com
    docx
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang (2023). DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based Formulas: A Nationwide Descriptive Study With Market Basket Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.641530.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    Frontiers
    Authors
    Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: As time evolved, traditional Chinese medicine (TCM) became integrated into the global medical system as complementary treatments. Some essential TCM herbs started to play a limited role in clinical practices because of Western medication development. For example, Fuzi (Aconiti Lateralis Radix Praeparata) is a toxic but indispensable TCM herb. Fuzi was mainly used in poor circulation and life-threatening conditions by history records. However, with various Western medication options for treating critical conditions currently, how is Fuzi used clinically and its indications in modern TCM are unclear. This study aimed to evaluate Fuzi and Fuzi-based formulas in modern clinical practices using artificial intelligence and data mining methods.Methods: This nationwide descriptive study with market basket analysis used a cohort selected from the Taiwan National Health Insurance database that contained one million national representatives between 2003 and 2010 used for our analysis. Descriptive statistics were performed to demonstrate the modern clinical indications of Fuzi. Market basket analysis was calculated by the Apriori algorithm to discover the association rules between Fuzi and other TCM herbs.Results: A total of 104,281 patients using 405,837 prescriptions of Fuzi and Fuzi-based formulas were identified. TCM doctors were found to use Fuzi in pulmonary (21.5%), gastrointestinal (17.3%), and rheumatologic (11.0%) diseases, but not commonly in cardiovascular diseases (7.4%). Long-term users of Fuzi and Fuzi-based formulas often had the following comorbidities diagnosed by Western doctors: osteoarthritis (31.0%), peptic ulcers (29.5%), hypertension (19.9%), and COPD (19.7%). Patients also used concurrent medications such as H2-receptor antagonists, nonsteroidal anti-inflammatory drugs, β-blockers, calcium channel blockers, and aspirin. Through market basket analysis, for the first time, we noticed many practical Fuzi-related herbal pairs such as Fuzi–Hsihsin (Asari Radix et Rhizoma)–Dahuang (Rhei Radix et Rhizoma) for neurologic diseases and headache.Conclusion: For the first time, big data analysis was applied to uncover the modern clinical indications of Fuzi in addition to traditional use. We provided necessary evidence on the scientific use of Fuzi in current TCM practices, and the Fuzi-related herbal pairs discovered in this study are helpful to the development of new botanical drugs.

  8. A

    ‘Groceries Market Basket Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Groceries Market Basket Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-market-basket-dataset-602f/df0c0905/?iid=017-948&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Groceries Market Basket Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/irfanasrullah/groceries on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context The Groceries Market Basket Dataset, which can be found here. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
    The data is suitable to do data mining for market basket analysis which has multiple variables.

    Acknowledgement Thanks to https://github.com/shubhamjha97/association-rule-mining-apriori
    The data is under course Association rules mining using Apriori algorithm. Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus. Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

    Pre-processing

    The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was created from the unique items in the dataset to integers so that each item corresponded to a unique integer. The entire data was mapped to integers to reduce the storage and computational requirement. A reverse mapping was created from the integers to the item, so that the item names could be written in the final output file.

    Don't forget to upvote before you download :)

    --- Original source retains full ownership of the source dataset ---

  9. c

    ASDA groceries data

    • crawlfeeds.com
    csv, zip
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). ASDA groceries data [Dataset]. https://crawlfeeds.com/datasets/asda-groceries-data
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    ASDA is england groceries supermarket chain stores and information extrated using crawl feeds in-house tools.

    The data is suitable to do data mining for market basket analysis which has multiple variables.

    Dataset details

    Total records: 37,400

    36,000+ records have brand

    37,000+ records have price

    36,000+ records have net content

    36,000+ records have ingredients

    37,000+ records have product details

  10. Mall_customer data

    • kaggle.com
    Updated Aug 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashlesha P D (2019). Mall_customer data [Dataset]. https://www.kaggle.com/ashleshaprix/mall-customer-data/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashlesha P D
    Description

    Dataset

    This dataset was created by Ashlesha P D

    Contents

  11. Market Basket Analysis Data

    • kaggle.com
    Updated Aug 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Romani Banerjee (2020). Market Basket Analysis Data [Dataset]. https://www.kaggle.com/romanibanerjee/market-basket-analysis-data/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Romani Banerjee
    Description

    Dataset

    This dataset was created by Romani Banerjee

    Contents

  12. ReInstitute Data Set

    • figshare.com
    docx
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moinak Bhaduri (2025). ReInstitute Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.29286521.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Moinak Bhaduri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collected by the RE!NSTITUTE™. Each row represents one deployment of the 100-Day Challenge™. A cross indicates changes in the corresponding aspect could bebrought about in that instance of the experiment.

  13. A

    ‘Groceries dataset ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation
    Explore at:
    Dataset updated
    Aug 15, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

    --- Original source retains full ownership of the source dataset ---

  14. Market Basket Analysis_Store_Data

    • kaggle.com
    Updated Apr 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N C Chetan (2021). Market Basket Analysis_Store_Data [Dataset]. https://www.kaggle.com/ncchetan/market-basket-analysis-store-data/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    N C Chetan
    Description

    Dataset

    This dataset was created by N C Chetan

    Contents

  15. d

    Replication Data for: Svalbard through the prism of Russian media

    • search.dataone.org
    • dataverse.azure.uit.no
    • +1more
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Obukhova, Anna (2024). Replication Data for: Svalbard through the prism of Russian media [Dataset]. http://doi.org/10.18710/UEZZUS
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    DataverseNO
    Authors
    Obukhova, Anna
    Time period covered
    Jan 1, 2010 - Dec 31, 2021
    Area covered
    Svalbard, Russia
    Description

    The study applies Market Basket Analysis and Keymorph Analysis to analyze the articles related to Svalbard published in a sample of Russian mainstream federal and north-western regional media outlets produced between 2010 and 2021. The data for Market Basket Analysis is divided into six target subcorpora: Federal 2010-2013, Regional 2010-2013, Federal 2014-2017, Regional 2014-2017, Federal 2018-2021, and Regional 2018-2021. The data for Keymorph Analysis consists of six target subcorpora: Federal 2010-2013*, Regional 2010-2013*, Federal 2014-2017*, Regional 2014-2017*, Federal 2018-2021*, and Regional 2018-2021*. The data for Keymorph Analysis are the texts containing the keyword 'Spitsbergen' obtained from the data for Market Basket Analysis. Market Basket Analysis is used to retrieve Associative Arrays consisting of various keywords for the keyword meaning 'Spitsbergen'. Keymorph Analysis examines the prominence of the grammatical cases of nouns meaning 'Russia', 'Norway', and 'Spitsbergen'. The dataset includes: 1) the R code for keyword analysis (keywords serve as an input for Market Basket Analysis); 2) lists of keywords obtained from six target subcorpora Federal 2010-2013, Regional 2010-2013, Federal 2014-2017, Regional 2014-2017, Federal 2018-2021, and Regional 2018-2021; 3) the R code for Market Basket Analysis; 4) examples with the nouns meaning 'Russia', 'Norway', and 'Spitsbergen' extracted from six target subcorpora Federal 2010-2013*, Regional 2010-2013*, Federal 2014-2017*, Regional 2014-2017*, Federal 2018-2021*, and Regional 2018-2021* and annotated according to the grammatical cases of these nouns as well as the semantic meanings of the cases; 5) the calculated difference index (DIN*) values for the grammatical cases of the nouns meaning 'Russia', 'Norway', and 'Spitsbergen'. The DIN* was used in Keymorph Analysis as the effect size metric; 6) the R code for creation of the bar chart with DIN* values for the grammatical cases of the nouns meaning 'Russia', 'Norway', and 'Spitsbergen'.

  16. o

    Synthetic Retail Transactions Dataset

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Synthetic Retail Transactions Dataset [Dataset]. https://www.opendatabay.com/data/dataset/a25d7b0f-dc8c-4c01-b0af-c90597f4a20f
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    E-commerce & Online Transactions
    Description

    This dataset provides simulated retail transaction data, offering valuable insights into customer purchasing behaviour and store operations. It is designed to facilitate market basket analysis, customer segmentation, and a variety of other retail analytics tasks. Each row captures detailed transaction information, including a unique identifier, the date and time of purchase, customer details, a list of purchased products, total items, total cost, payment method, and location details such as city and store type. Furthermore, it includes indicators for discounts and promotions applied, along with a customer category based on background or age group, and the season of purchase. This dataset is entirely synthetic, generated using the Python Faker library, making it a safe and versatile resource for researchers, data scientists, and analysts to develop and test algorithms, models, and analytical tools without using real customer data.

    Columns

    • Transaction_ID: A unique 10-digit identifier for each individual transaction, ensuring each purchase can be uniquely identified.
    • Date: The precise date and time when each transaction occurred, providing a timestamp for every purchase.
    • Customer_Name: The name of the customer who completed the purchase, offering a means to identify individual customers.
    • Product: A detailed list of all products included in a specific transaction.
    • Total_Items: The total quantity of items purchased within a single transaction.
    • Total_Cost: The overall financial value of the transaction, denominated in currency.
    • Payment_Method: The chosen payment method for the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The geographical location (city) where the transaction took place.
    • Store_Type: The classification of the store where the purchase was made, e.g., supermarket, convenience store, department store.
    • Discount_Applied: A boolean indicator (True/False) showing whether a discount was applied to the transaction.
    • Customer_Category: A categorisation of the customer based on their background or age group.
    • Season: The season (e.g., spring, summer, autumn, winter) in which the purchase was made.
    • Promotion: The specific type of promotion applied to the transaction, if any (e.g., "None", "BOGO", "Discount on Selected Items").

    Distribution

    This dataset is typically provided in a CSV file format. It contains approximately 1 million individual transaction records. The data spans a time range from 2020-01-01 to 2024-05-19. There are 329,738 unique customer names and 571,947 unique product entries. Payment methods are distributed with 25% Cash, 25% Debit Card, and 50% Other. Transaction locations include Boston (10%), Dallas (10%), and other cities (80%). Store types are categorised as Supermarket (17%), Pharmacy (17%), and other types (67%). Discounts were applied to approximately 50% of the transactions.

    Usage

    This dataset is ideally suited for: * Market Basket Analysis: Uncovering associations between products and identifying common buying patterns. * Customer Segmentation: Grouping customers based on their purchasing behaviour to target specific offers. * Pricing Optimisation: Developing strategies to optimise pricing and identify opportunities for discounts and promotions. * Retail Analytics: Analysing overall store performance and emerging customer trends. * Algorithmic Development: Testing and refining machine learning models for retail forecasting or recommendation systems.

    Coverage

    The dataset's geographic coverage includes transactions from various cities, such as Boston and Dallas, representing a broad, though simulated, global scope. The time range of the transactions extends from 1st January 2020 to 19th May 2024. Demographic insights are provided through the Customer_Category column, which classifies customers based on background or age group, allowing for demographic-based analyses. As a synthetic dataset, specific real-world demographic notes are not applicable.

    License

    CC0

    Who Can Use It

    This dataset is beneficial for a wide range of users, including: * Researchers: For academic studies on consumer behaviour and retail economics. * Data Scientists: To develop and validate predictive models, such as recommender systems or churn prediction models. * Analysts: For performing in-depth retail analytics, market basket analysis, and customer segmentation to inform business decisions. * Students: As a practical, realistic dataset for learning and applying data analysis techniques in a retail context.

    Dataset Name Suggestions

    • Retail Transactions Dataset
    • Customer Purchasing Behaviour Data
    • Market Basket Analysis Data
    • Synthetic Retail Transactions
    • E-commerce Transaction Log

    Attributes

    Original Dat

  17. o

    Retail Transaction Dataset

    • opendatabay.com
    .undefined
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Retail Transaction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/ce827d4f-444a-4ffc-a50e-a769e596a2d3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Food & Beverage Consumption
    Description

    This dataset contains 30,000 unique retail transactions, each representing a customer's shopping basket in a simulated grocery store environment. The data was generated with realistic product combinations and purchase patterns, suitable for association rule mining, recommendation systems and market basket analysis.

    Each row corresponds to a single transaction, listing:

    A unique transaction ID A customer ID The full list of products bought in that transaction The time of the transaction The dataset includes products across various categories such as beverages, snacks, dairy, household items, fruits, vegetables and frozen foods.

    This data is entirely synthetic and does not contain any real user information.

    Original Data Source: Retail Transaction Dataset

  18. c

    ocado groceries data

    • crawlfeeds.com
    csv, zip
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). ocado groceries data [Dataset]. https://crawlfeeds.com/datasets/ocado-groceries-data
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Ocado is british retail business and information extrated using crawl feeds in-house tools.

    The data is suitable to do data mining for market basket analysis which has multiple variables.

    Dataset details

    Total records: 47,000+

    47, 000+ records have brand

    47,000+ records have price

    47,000+ records have ingredients

    47,000+ records have product details

  19. A

    ‘Bakery Sales Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Bakery Sales Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-bakery-sales-dataset-0101/latest
    Explore at:
    Dataset updated
    Sep 16, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Bakery Sales Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/akashdeepkuila/bakery on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    We live in the era of e-commerce and digital marketing. We have even small scale businesses going online as the opportunities are endless. Since a huge chunk of the people who have access to internet is switching to online shopping, large retailers are actively searching for ways to increase their profit. Market Basket analysis is one such key techniques used by large retailers to to increase sales by understanding the customers' purchasing behavior & patterns. Market basket analysis examines collections of items to find relationships between items that go together within the business context.

    Content

    The dataset belongs to "The Bread Basket" a bakery located in Edinburgh. The dataset provide the transaction details of customers who ordered different items from this bakery online during the time period from 26-01-11 to 27-12-03. The dataset has 20507 entries, over 9000 transactions, and 4 columns.

    Variables

    • TransactionNo : unique identifier for every single transaction
    • Items : items purchased
    • DateTime : date and time stamp of the transactions
    • Daypart : part of the day when a transaction is made (morning, afternoon, evening, night)
    • DayType : classifies whether a transaction has been made in weekend or weekdays

    Inspiration

    The dataset is ideal for anyone looking to practice association rule mining and understand the business context of data mining for better understanding of the buying pattern of customers.

    --- Original source retains full ownership of the source dataset ---

  20. Customer360Insights

    • kaggle.com
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dave Darshan (2024). Customer360Insights [Dataset]. https://www.kaggle.com/datasets/davedarshan/customer360insights
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dave Darshan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Customer360Insights

    The Customer360Insights dataset is a synthetic collection meticulously designed to mirror the multifaceted nature of customer interactions within an e-commerce platform. It encompasses a wide array of variables, each serving as a pillar to support various analytical explorations. Here’s a breakdown of the dataset and the potential analyses it enables:

    Dataset Description

    • Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.
    • Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.
    • Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.
    • Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.
    • Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.

    Types of Analysis

    • Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.
    • Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.
    • Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.
    • Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.
    • Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.
    • Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.
    • Market Basket Analysis: Discover product affinities and develop cross-selling strategies.

    This dataset is a playground for data enthusiasts to practice cleaning, transforming, visualizing, and modeling data. Whether you’re conducting A/B testing for marketing campaigns, forecasting sales, or building customer profiles, Customer360Insights offers a rich, realistic dataset for honing your data science skills.

    Curious about how I created the data? Feel free to click here and take a peek! 😉

    📊🔍 Good Luck and Happy Analysing 🔍📊

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu