10 datasets found
  1. Real Market Data for Association Rules

    • kaggle.com
    zip
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
    Explore at:
    zip(3068 bytes)Available download formats
    Dataset updated
    Sep 15, 2023
    Authors
    Ruken Missonnier
    Description

    1. Introduction

    Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

    2. Dataset Overview

    Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

    • Bread
    • Honey
    • Bacon
    • Toothpaste
    • Banana
    • Apple
    • Hazelnut
    • Cheese
    • Meat
    • Carrot
    • Cucumber
    • Onion
    • Milk
    • Butter
    • ShavingFoam
    • Salt
    • Flour
    • HeavyCream
    • Egg
    • Olive
    • Shampoo
    • Sugar

    3. Purpose of the Dataset

    The raison d'รชtre of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

    4. Data Format

    The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

    5. Potential Applications

    This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

    • Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.
    • Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.
    • Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.
    • Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

    6. Analysis Techniques

    The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

    7. Conclusion

    In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.

  2. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data โ€“ so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  3. Retail Store Transactions for Association Rules

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimple Bathija (2025). Retail Store Transactions for Association Rules [Dataset]. https://www.kaggle.com/datasets/dimplebathija/retail-store-transactions-for-association-rules
    Explore at:
    zip(123202 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    Dimple Bathija
    Description

    Dataset

    This dataset was created by Dimple Bathija

    Contents

  4. Retail Market Basket Transactions Dataset

    • kaggle.com
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasiq Ali (2025). Retail Market Basket Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/retail-market-basket-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wasiq Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    The Market_Basket_Optimisation dataset is a classic transactional dataset often used in association rule mining and market basket analysis.
    It consists of multiple transactions where each transaction represents the collection of items purchased together by a customer in a single shopping trip.

    • File Name: Market_Basket_Optimisation.csv
    • Format: CSV (Comma-Separated Values)
    • Structure: Each row corresponds to one shopping basket. Each column in that row contains an item purchased in that basket.
    • Nature of Data: Transactional, categorical, sparse.
    • Primary Use Case: Discovering frequent itemsets and association rules to understand shopping patterns, product affinities, and to build recommender systems.

    Detailed Information

    ๐Ÿ“Š Dataset Composition

    • Transactions: 7,501 (each row = one basket).
    • Items (unique): Around 120 distinct products (e.g., bread, mineral water, chocolate, etc.).
    • Columns per row: Up to 20 possible items (not fixed; some rows have fewer, some more).
    • Data Type: Purely categorical (no numerical or continuous features).
    • Missing Values: Present in the form of empty cells (since not every basket has all 20 columns).
    • Duplicates: Some baskets may appear more than once โ€” this is acceptable in transactional data as multiple customers can buy the same set of items.

    ๐Ÿ›’ Nature of Transactions

    • Basket Definition: Each row captures items bought together during a single visit to the store.
    • Variability: Basket size varies from 1 to 20 items. Some customers buy only one product, while others purchase a full set of groceries.
    • Sparsity: Since there are ~120 unique items but only a handful appear in each basket, the dataset is sparse. Most entries in the one-hot encoded representation are zeros.

    ๐Ÿ”Ž Examples of Data

    Example transaction rows (simplified):

    Item 1Item 2Item 3Item 4...
    BreadButterJam
    Mineral waterChocolateEggsMilk
    SpaghettiTomato sauceParmesan

    Here, empty cells mean no item was purchased in that slot.

    ๐Ÿ“ˆ Applications of This Dataset

    This dataset is frequently used in data mining, analytics, and recommendation systems. Common applications include:

    1. Association Rule Mining (Apriori, FP-Growth):

      • Discover rules like {Bread, Butter} โ‡’ {Jam} with high support and confidence.
      • Identify cross-selling opportunities.
    2. Product Affinity Analysis:

      • Understand which items tend to be purchased together.
      • Helps with store layout decisions (placing related items near each other).
    3. Recommendation Engines:

      • Build systems that suggest "You may also like" products.
      • Example: If a customer buys pasta and tomato sauce, recommend cheese.
    4. Marketing Campaigns:

      • Bundle promotions and discounts on frequently co-purchased products.
      • Personalized offers based on buying history.
    5. Inventory Management:

      • Anticipate demand for certain product combinations.
      • Prevent stockouts of items that drive the purchase of others.

    ๐Ÿ“Œ Key Insights Potentially Hidden in the Dataset

    • Popular Items: Some items (like mineral water, eggs, spaghetti) occur far more frequently than others.
    • Product Pairs: Frequent pairs and triplets (e.g., pasta + sauce + cheese) reflect natural meal-prep combinations.
    • Basket Size Distribution: Most customers buy fewer than 5 items, but a small fraction buy 10+ items, showing long-tail behavior.
    • Seasonality (if extended with timestamps): Certain items might show peaks in demand during weekends or holidays (though timestamps are not included in this dataset).

    ๐Ÿ“‚ Dataset Limitations

    1. No Customer Identifiers:

      • We cannot track repeated purchases by the same customer.
      • Analysis is limited to basket-level insights.
    2. No Timestamps:

      • No temporal analysis (trends over time, seasonality) is possible.
    3. No Quantities or Prices:

      • We only know whether an item was purchased, not how many units or its cost.
    4. Sparse & Noisy:

      • Many baskets are small (1โ€“2 items), which may produce weak or trivial rules.

    ๐Ÿ”ฎ Potential Extensions

    • Synthetic Timestamps: Assign simulated timestamps to study temporal buying patterns.
    • Add Customer IDs: If merged with external data, one can perform personalized recommendations.
    • Price Data: Adding cost allows for profit-driven association rules (not just frequency-based).
    • Deep Learning Models: Sequence models (RNNs, Transformers) could be applied if temporal ordering of items is introduced.

    ...

  5. Retail Analytics Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Retail Analytics Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/retail-analytics-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Retail Analytics Market Size 2025-2029

    The retail analytics market size is forecast to increase by USD 28.47 billion, at a CAGR of 29.5% between 2024 and 2029.

    The market is experiencing significant growth, driven by the increasing volume and complexity of data generated by retail businesses. This data deluge offers valuable insights for retailers, enabling them to optimize operations, enhance customer experience, and make data-driven decisions. However, this trend also presents challenges. One of the most pressing issues is the increasing adoption of Artificial Intelligence (AI) in the retail sector. While AI brings numerous benefits, such as personalized marketing and improved supply chain management, it also raises privacy and security concerns among customers.
    Retailers must address these concerns through transparent data handling practices and robust security measures to maintain customer trust and loyalty. Navigating these challenges requires a strategic approach, with a focus on data security, customer privacy, and effective implementation of AI technologies. Companies that successfully harness the power of retail analytics while addressing these challenges will gain a competitive edge in the market.
    

    What will be the Size of the Retail Analytics Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by the constant need for businesses to gain insights from their data and adapt to shifting consumer behaviors. Entities such as text analytics, data quality, price optimization, customer journey mapping, mobile analytics, time series analysis, regression analysis, social media analytics, data mining, historical data analysis, and data cleansing are integral components of this dynamic landscape. Text analytics uncovers hidden patterns and trends in unstructured data, while data quality ensures the accuracy and consistency of information. Price optimization leverages historical data to determine optimal pricing strategies, and customer journey mapping provides insights into the customer experience.

    Mobile analytics caters to the growing number of mobile shoppers, and time series analysis identifies trends and patterns over time. Regression analysis uncovers relationships between variables, social media analytics monitors brand sentiment, and data mining uncovers hidden patterns and correlations. Historical data analysis informs strategic decision-making, and data cleansing prepares data for analysis. Customer feedback analysis provides valuable insights into customer satisfaction, and association rule mining uncovers relationships between customer behaviors and purchases. Predictive analytics anticipates future trends, real-time analytics delivers insights in real-time, and market basket analysis uncovers relationships between products. Data security safeguards sensitive information, machine learning (ML) and artificial intelligence (AI) enhance data analysis capabilities, and cloud-based analytics offers flexibility and scalability.

    Business intelligence (BI) and open-source analytics provide comprehensive data analysis solutions, while inventory management and supply chain optimization streamline operations. Data governance ensures data is used ethically and effectively, and loyalty programs and A/B testing optimize customer engagement and retention. Seasonality analysis accounts for seasonal trends, and trend analysis identifies emerging trends. Data integration connects disparate data sources, and clickstream analysis tracks user behavior on websites. In the ever-changing retail landscape, these entities are seamlessly integrated into retail analytics solutions, enabling businesses to stay competitive and adapt to evolving market dynamics.

    How is this Retail Analytics Industry segmented?

    The retail analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Application
    
      In-store operation
      Customer management
      Supply chain management
      Marketing and merchandizing
      Others
    
    
    Component
    
      Software
      Services
    
    
    Deployment
    
      Cloud-based
      On-premises
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        UK
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Rest of World (ROW)
    

    By Application Insights

    The in-store operation segment is estimated to witness significant growth during the forecast period. In the realm of retail, the in-store operation segment of the market plays a pivotal role in optimizing brick-and-mortar retail operations. This segment encompasses various data analytics applications within phys

  6. D

    Market Basket Analysis AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Market Basket Analysis AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/market-basket-analysis-ai-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Market Basket Analysis AI Market Outlook



    According to our latest research, the global Market Basket Analysis AI market size reached USD 1.32 billion in 2024, fueled by surging demand for data-driven decision-making and advanced analytics across retail and e-commerce sectors. The market is expected to grow at a robust CAGR of 18.7% from 2025 to 2033, reaching an estimated USD 6.19 billion by 2033. This remarkable growth is primarily attributed to the increasing adoption of artificial intelligence for customer behavior analysis, inventory management, and personalized marketing strategies.




    The primary growth factor for the Market Basket Analysis AI market is the exponential rise in digital transactions and online shopping, which generate massive volumes of transactional data. Retailers and e-commerce platforms are leveraging AI-powered market basket analysis tools to extract actionable insights from this data, enabling them to optimize product placement, cross-sell and up-sell strategies, and enhance the overall customer experience. The integration of AI algorithms, such as association rule mining and deep learning, has significantly improved the accuracy and speed of identifying purchasing patterns, thereby driving higher sales conversions and customer retention rates. Furthermore, the increasing focus on omnichannel retailing and seamless customer journeys has made AI-driven market basket analysis indispensable for both brick-and-mortar and online stores.




    Another critical driver is the technological advancements in AI and machine learning, which have made Market Basket Analysis AI solutions more accessible, scalable, and cost-effective. The proliferation of cloud computing, edge analytics, and big data infrastructure has enabled organizations of all sizes to deploy sophisticated analytics tools without heavy upfront investments. Additionally, the growing emphasis on hyper-personalization and dynamic pricing strategies in highly competitive sectors such as retail, BFSI, and healthcare has further accelerated the adoption of AI-driven market basket analysis. Organizations are increasingly recognizing the value of real-time analytics in predicting consumer preferences and optimizing inventory, leading to reduced stockouts and improved profit margins.




    Regulatory compliance and data privacy concerns are also shaping the growth trajectory of the Market Basket Analysis AI market. With stringent regulations such as GDPR and CCPA coming into effect, organizations are required to ensure responsible data handling and transparency in AI-driven analytics. This has led to the development of more secure and compliant Market Basket Analysis AI solutions, which are gaining traction among enterprises seeking to balance innovation with regulatory requirements. The increased focus on ethical AI and explainable AI models is also fostering trust among end-users, thereby contributing to the sustained growth of the market.




    From a regional perspective, North America continues to dominate the Market Basket Analysis AI market, driven by the presence of leading technology providers, early adopters, and a mature digital infrastructure. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid urbanization, expanding e-commerce ecosystems, and increasing investments in AI research and development. Europe is also witnessing significant growth, supported by robust regulatory frameworks and the rising adoption of AI in retail and manufacturing sectors. Latin America and the Middle East & Africa are gradually catching up, with a growing number of enterprises recognizing the benefits of AI-driven analytics for business transformation.



    Component Analysis



    The Market Basket Analysis AI market is segmented by component into software, hardware, and services. The software segment holds the largest share, accounting for over 55% of the total market revenue in 2024. This dominance is attributed to the widespread adoption of advanced analytics platforms, machine learning algorithms, and data visualization tools that enable organizations to derive actionable insights from complex transactional datasets. Leading vendors are continuously enhancing their software offerings with features such as real-time analytics, predictive modeling, and integration with enterprise resource planning (ERP) systems, making them indispensable for retailers and e-commerce platforms aiming to optimize their product assortments a

  7. Retail POS Dataset for Market Basket Analysis

    • kaggle.com
    zip
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARUNAGIRINATHAN K (2025). Retail POS Dataset for Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/arunsworkspace/retail-pos-dataset-for-market-basket-analysis
    Explore at:
    zip(140535 bytes)Available download formats
    Dataset updated
    Aug 25, 2025
    Authors
    ARUNAGIRINATHAN K
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ๐Ÿ›’ Retail POS Dataset for Market Basket Analysis

    ๐Ÿ“Œ Dataset Overview

    This dataset is a synthetically generated retail Point-of-Sale (POS) dataset designed for Market Basket Analysis (MBA), Association Rule Mining, and Sales Pattern Identification. It simulates transactions in a supermarket/retail environment, where each order (basket) contains multiple items across different product categories.

    The dataset is ideal for applying Apriori, FP-Growth, and ECLAT algorithms to uncover:

    • Frequently bought together items (e.g., {Bread, Butter} โ†’ {Milk})
    • Cross-category associations (e.g., {Shampoo} โ†’ {Soap})
    • Time-based shopping patterns (e.g., evening orders = snacks + beverages)
    • Customer-level purchasing behavior

    ๐Ÿ“Š Dataset Size

    • Rows: 10,000 (transactions ร— items)
    • Unique Products: 63
    • Categories: 12
    • Timeframe Simulated: 1 year (2023)

    ๐Ÿ“ฆ Categories

    The dataset includes items from 12 realistic retail categories:

    • Dairy & Eggs
    • Bakery
    • Meat & Seafood
    • Fruits & Vegetables
    • Grains & Staples
    • Snacks
    • Beverages
    • Personal Care & Health
    • Household & Cleaning
    • Electronics & Accessories
    • Clothing & Lifestyle
    • Stationery & Books

    ๐Ÿ“‘ Column Description

    Column NameDescription
    order_idUnique ID for each order (basket)
    user_idUnique ID for customer
    order_dateDate of the order
    timeTime of the transaction (HH:MM:SS)
    order_hour_of_dayHour of purchase (6โ€“22)
    product_namePurchased item name
    quantityUnits of the product bought
    pricePrice of the product (in local currency)
    categoryProduct category
    product_idUnique ID for product


    ๐Ÿ” Possible Use Cases

    • Market Basket Analysis (MBA): Identify frequently bought together products.
    • Sales Trends: Analyze shopping patterns by time of day and category.
    • Customer Behavior: Segment users by purchase preferences.
    • Recommendation Systems: Build โ€œcustomers who bought X also bought Yโ€ models.
    • Retail Analytics: Study pricing impact, seasonal demand, and category performance.
  8. Groceries dataset

    • kaggle.com
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
    Explore at:
    zip(263057 bytes)Available download formats
    Dataset updated
    Sep 17, 2020
    Authors
    Heeral Dedhia
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

  9. As mentioned in the experiment results section, we divide the data in small...

    • plos.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). As mentioned in the experiment results section, we divide the data in small and large datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The small datasets for calculating the frequency of itemsets in transaction database contain Accidents, Chess, Connection, Mushroom, PUSBM, and Retail [32] transaction datasets. There are 500, 1000, 2000, and 5000 transactions per dataset. The small datasets for calculating the utility of itemsets in a transaction database contain Accidents, Chess, Connection, Mushroom, PUSBM, and Retail [32] transaction datasets. There are 500, 1000, 2000, and 5000 transactions per dataset. The large datasets for caluclating the frequency of itemsets in a transaction database contain Accidents, Connection, and PUSBM [32] datasets. There are 10000, 20000, 30000, and 50000 transactions per dataset. The large datasets for calculating the utility of itemsets in a transaction database contain Accidents, Connection, and PUSBM [32] transaction datasets. There are 10000, 20000, 30000, and 50000 transactions per dataset. (ZIP)

  10. Basket Analysis (Association Rule Mining)

    • kaggle.com
    zip
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Basket Analysis (Association Rule Mining) [Dataset]. https://www.kaggle.com/datasets/vikramamin/basket-analysis-association-rule-mining
    Explore at:
    zip(345413 bytes)Available download formats
    Dataset updated
    Apr 25, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The basket dataset contains a list of items available for purchase for customers. These items can be found in sets as well. For eg. milk and sugar.

    The analysis being done is to ascertain for the retailers which item or sets of items are purchased. Sometimes it so happens that the purchase of an item by the customer leads the customer to purchase another item as well. It is a sort of an association of items. This is called "Association Rule Mining".

    It shows which items appear together in a transaction or relation. Itโ€™s majorly used by retailers, grocery stores, an online marketplace that has a large transactional database.

    We wouldnโ€™t want to calculate all associations between every possible combination of products. Instead, we would want to select only potentially โ€œrelevantโ€ rules from the set of all possible rules. Therefore, we use the measures support, confidence and lift to reduce the number of relationships we need to analyze.

    Support says how popular an item is, as measured in the proportion of transactions in which an item set appears.

    Confidence says how likely item Y is purchased when item X is purchased, Thus it is measured by the proportion of transaction with item X in which item Y also appears (Support/Antecedent (LHS)).

    Lift says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is. (Confidence/Consequent (RHS))

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
Organization logo

Real Market Data for Association Rules

Unveiling Retail Insights with Apriori and FP-Growth Algorithms

Explore at:
zip(3068 bytes)Available download formats
Dataset updated
Sep 15, 2023
Authors
Ruken Missonnier
Description

1. Introduction

Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

2. Dataset Overview

Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

  • Bread
  • Honey
  • Bacon
  • Toothpaste
  • Banana
  • Apple
  • Hazelnut
  • Cheese
  • Meat
  • Carrot
  • Cucumber
  • Onion
  • Milk
  • Butter
  • ShavingFoam
  • Salt
  • Flour
  • HeavyCream
  • Egg
  • Olive
  • Shampoo
  • Sugar

3. Purpose of the Dataset

The raison d'รชtre of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

4. Data Format

The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

5. Potential Applications

This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

  • Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.
  • Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.
  • Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.
  • Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

6. Analysis Techniques

The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

7. Conclusion

In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.

Search
Clear search
Close search
Google apps
Main menu