100+ datasets found
  1. Real Market Data for Association Rules

    • kaggle.com
    zip
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
    Explore at:
    zip(3068 bytes)Available download formats
    Dataset updated
    Sep 15, 2023
    Authors
    Ruken Missonnier
    Description

    1. Introduction

    Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

    2. Dataset Overview

    Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

    • Bread
    • Honey
    • Bacon
    • Toothpaste
    • Banana
    • Apple
    • Hazelnut
    • Cheese
    • Meat
    • Carrot
    • Cucumber
    • Onion
    • Milk
    • Butter
    • ShavingFoam
    • Salt
    • Flour
    • HeavyCream
    • Egg
    • Olive
    • Shampoo
    • Sugar

    3. Purpose of the Dataset

    The raison d'être of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

    4. Data Format

    The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

    5. Potential Applications

    This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

    • Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.
    • Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.
    • Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.
    • Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

    6. Analysis Techniques

    The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

    7. Conclusion

    In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.

  2. Datasets used for evaluating the customized version of Apriori algorithm.

    • plos.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Datasets used for evaluating the customized version of Apriori algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A zip archive containing microbial abundance tables which were employed for deciphering association rules using the customised version of the Apriori algorithm. (ZIP)

  3. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  4. Number of association rules generated using the Apriori rule mining approach...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated using the Apriori rule mining approach with various datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summarised information pertaining to (a) the number of samples, (b) the number of generated association rules (total as well as rules that involve 3 or more genera), (c) the unique number of microbial genera involved in the identified association rules, (d) execution time, and (e) the number of rules generated using an alternative rule mining strategy (detailed in discussion section of the manuscript).

  5. Association rule mining data for census tract chemical exposure analysis

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Association rule mining data for census tract chemical exposure analysis [Dataset]. https://catalog.data.gov/dataset/association-rule-mining-data-for-census-tract-chemical-exposure-analysis
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Chemical concentration, exposure, and health risk data for U.S. census tracts from National Scale Air Toxics Assessment (NATA). This dataset is associated with the following publication: Huang, H., R. Tornero-Velez, and T. Barzyk. Associations between socio-demographic characteristics and chemical concentrations contributing to cumulative exposures in the United States. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 27(6): 544-550, (2017).

  6. Datas of Disease Patterns

    • figshare.com
    zip
    Updated Jun 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jichang Zhao (2017). Datas of Disease Patterns [Dataset]. http://doi.org/10.6084/m9.figshare.5035775.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jichang Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1.the "dingxiang_datas.xls"contains all the original data which is crawled from DingXiang forum, and also the word segmentation result for each medical record is given.2.the "pmi_new_words.txt" is the result of new medical words found by calculating mutual information.3.the "association_rules" folder contains the association rules mined from the dataset where h-confidence threshold is set 0.3 and support threshold is set 0.0001.4.the "network_communities.csv" describes the complication communities.p.s. if you encounter a "d", it means the word is a disease description vocabulary, and "z" or "s" represents a symptom description vocabulary.

  7. Survey on Association Rule Mining Using "APRIORI" Algorithm

    • figshare.com
    • search.datacite.org
    pdf
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogesh Khaladkar; Pramod warale (2016). Survey on Association Rule Mining Using "APRIORI" Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1393101.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Yogesh Khaladkar; Pramod warale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Association rule is an important technique in data mining.

  8. Modified Big SalesData

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhra Rana (2025). Modified Big SalesData [Dataset]. https://www.kaggle.com/datasets/shubhrarana/modified-big-salesdata
    Explore at:
    zip(2360624 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    Shubhra Rana
    License

    http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html

    Description

    Modified and Cleaned data set from https://www.kaggle.com/datasets/pigment/big-sales-data.

    This can be used for EDA, Data Analytics, Data Mining and Visualizations.

    Will be uploading two more versions shortly.

    1. Denormalized Version of the data to optimize the storage
    2. Consolidated Data from all months together to perform data mining tasks like ARM.
  9. d

    Comparing Vocabulary Term Recommendations using Association Rules and...

    • demo-b2find.dkrz.de
    Updated Sep 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Comparing Vocabulary Term Recommendations using Association Rules and Learning To Rank: A User Study - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/0410d5d0-ccb4-548a-b8d4-bea5d404af8b
    Explore at:
    Dataset updated
    Sep 21, 2025
    Description

    The user-study evaluates a vocabulary term recommendation service that is based on how other data providers have used RDF classes and properties in the Linked Open Data cloud. The study compares the machine learning technique Learning to Rank (L2R), the classical data mining approach Association Rule mining (AR), and a baseline that does not provide any recommendations. This data collection comprises the raw results of this user-study in SPSS format.

  10. Market Basket Optimization

    • kaggle.com
    zip
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aly El-badry (2025). Market Basket Optimization [Dataset]. https://www.kaggle.com/datasets/alyelbadry/market-basket-optimization/code
    Explore at:
    zip(47991 bytes)Available download formats
    Dataset updated
    Jan 28, 2025
    Authors
    Aly El-badry
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains transactional data collected for market basket analysis. Each row represents a single transaction with items purchased together. It is ideal for implementing association rule mining techniques such as Apriori, FP-Growth, and other machine learning algorithms.

    Key Features:

    • Transactions: Lists of items purchased together in a single transaction.
    • Applications: Perfect for studying customer purchase patterns, building recommendation systems, and identifying frequent item sets.
    • Usage: Use this dataset to practice generating actionable insights for retailers and e-commerce platforms.

    Format:

    • Rows: Each row represents a transaction.
    • Columns: Each column corresponds to an item in the transaction.

    Examples of Potential Use Cases:

    • Find combinations of items frequently purchased together.
    • Predict the likelihood of items being bought together.
    • Build AI-powered marketing strategies based on association rules.

    Credits:

    • This dataset is formatted for educational and research purposes. Feel free to use it to explore and enhance your skills in data mining and machine learning!
  11. n

    Data from: Cross-ontology multi-level association rule mining in the Gene...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 22, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashanti Manda; Seval Ozkan; Hui Wang; Fiona McCarthy; Susan M. Bridges (2013). Cross-ontology multi-level association rule mining in the Gene Ontology [Dataset]. http://doi.org/10.5061/dryad.nr353
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2013
    Dataset provided by
    Mississippi State University
    Authors
    Prashanti Manda; Seval Ozkan; Hui Wang; Fiona McCarthy; Susan M. Bridges
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Starkville, Mississippi
    Description

    The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  12. Flaredown Checkin Data

    • kaggle.com
    zip
    Updated Sep 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doug Hersak (2017). Flaredown Checkin Data [Dataset]. https://www.kaggle.com/doughersak/flaredown-checkin-data
    Explore at:
    zip(14471245 bytes)Available download formats
    Dataset updated
    Sep 28, 2017
    Authors
    Doug Hersak
    Description

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  13. Designing a more efficient, effective and safe Medical Emergency Team (MET)...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher (2023). Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis [Dataset]. http://doi.org/10.1371/journal.pone.0188688
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionHospitals have seen a rise in Medical Emergency Team (MET) reviews. We hypothesised that the commonest MET calls result in similar treatments. Our aim was to design a pre-emptive management algorithm that allowed direct institution of treatment to patients without having to wait for attendance of the MET team and to model its potential impact on MET call incidence and patient outcomes.MethodsData was extracted for all MET calls from the hospital database. Association rule data mining techniques were used to identify the most common combinations of MET call causes, outcomes and therapies.ResultsThere were 13,656 MET calls during the 34-month study period in 7936 patients. The most common MET call was for hypotension [31%, (2459/7936)]. These MET calls were strongly associated with the immediate administration of intra-venous fluid (70% [1714/2459] v 13% [739/5477] p

  14. Data from: Mining significant crisp-fuzzy spatial association rules

    • tandf.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb (2023). Mining significant crisp-fuzzy spatial association rules [Dataset]. http://doi.org/10.6084/m9.figshare.5873139.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.

  15. Z

    Association Rules and Semantic Relatedness (ARSR) - Evaluation Data

    • data.niaid.nih.gov
    Updated Jun 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leon Hutans (2022). Association Rules and Semantic Relatedness (ARSR) - Evaluation Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6655888
    Explore at:
    Dataset updated
    Jun 20, 2022
    Dataset provided by
    Friedrich Schiller University Jena
    Authors
    Leon Hutans
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ARSR stands for "Association Rules and Semantic Relatedness", a recommender system that combines association rule mining and semantic relatedness to create recommendations for form fields.

    ARSR was evaluated with focus on recommending values for fields of metadata forms. The evaluation was performed with two sets of association rules (R1 and R2). R1 was primarily used to assess the recommendation performance. R2 served as an alternative and led to similar results.

    This dataset contains the raw data ("collected", JSON format), generated during the evaluation. It includes i.a. input (populated fields and target field), expected output, and the top 40 generated recommendations for each test combination.

    In addition, the processed data ("analysed", CSV format) is provided. It is based on the raw data and is used to calculate metrics and plot the results.

    The source code for ARSR and the evaluation is available on GitLab (gitlab.com).

    Note: To perform the analysis on the raw data (collected) yourself, make sure to follow the setup instruction in the evaluation repository first. More specifically, install dependencies and unzip "data/cedar/test-instances.zip" so that URI mappings can be accessed. Then follow the instructions provided by the README file in the raw data archives (e.g. collected-R1.zip).

  16. Extensions to Mining Framework Annotation Rules

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, json, txt
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Author(s); Anonymous Author(s) (2023). Extensions to Mining Framework Annotation Rules [Dataset]. http://doi.org/10.5281/zenodo.8150033
    Explore at:
    bin, txt, json, csvAvailable download formats
    Dataset updated
    Jul 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Author(s); Anonymous Author(s)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Framework usage is challenging because the requirements for the correctness are often implicit. We focus on making

    such requirements more explicit by association rule mining on the data from client projects that use a framework.

    We present an extension to an existing baseline method that does this. In particular, we examine alternative rule

    quality measures used in the ranking of association rules mined, and alternatives in the selection of client projects.

    Such alternatives are novel and have not been explored in the context of the baseline method. We evaluate the alternatives

    by comparing their results to those produced by the baseline method. More concretely, we base the comparison on their

    ranking of incorrect rules, and on their measurements for the Area Under Curve metric. We conclude that some of

    the evaluated quality measures outperform the baseline for the ranking and selection of rules. We also show that the

    selection of secondary client projects, adding some clients that do not directly use the framework of interest, matters.

  17. Bakery Sales Dataset

    • kaggle.com
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akashdeep Kuila (2024). Bakery Sales Dataset [Dataset]. https://www.kaggle.com/akashdeepkuila/bakery
    Explore at:
    zip(219283 bytes)Available download formats
    Dataset updated
    Sep 25, 2024
    Authors
    Akashdeep Kuila
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    We live in the era of e-commerce and digital marketing. We have even small scale businesses going online as the opportunities are endless. Since a huge chunk of the people who have access to internet is switching to online shopping, large retailers are actively searching for ways to increase their profit. Market Basket analysis is one such key techniques used by large retailers to to increase sales by understanding the customers' purchasing behavior & patterns. Market basket analysis examines collections of items to find relationships between items that go together within the business context.

    Content

    The dataset belongs to "The Bread Basket" a bakery located in Edinburgh. The dataset provide the transaction details of customers who ordered different items from this bakery online during the time period from 30-10-2016 to 09-04-2017. The dataset has 20507 entries, over 9000 transactions, and 4 columns.

    Variables

    • TransactionNo : unique identifier for every single transaction
    • Items : items purchased
    • DateTime : date and time stamp of the transactions
    • Daypart : part of the day when a transaction is made (morning, afternoon, evening, night)
    • DayType : classifies whether a transaction has been made in weekend or weekdays

    Inspiration

    The dataset is ideal for anyone looking to practice association rule mining and understand the business context of data mining for better understanding of the buying pattern of customers.

  18. m

    Data for: Mining multiple association rules in LTPP database: an analysis of...

    • data.mendeley.com
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peiwen Hao (2018). Data for: Mining multiple association rules in LTPP database: an analysis of asphalt pavement thermal cracking distress [Dataset]. http://doi.org/10.17632/w94jndtmpr.1
    Explore at:
    Dataset updated
    Oct 16, 2018
    Authors
    Peiwen Hao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MATLAB Codes and original data for Apriori

  19. S

    Electronic Medical Record Data-Mining

    • simtk.org
    Updated Sep 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Chen (2017). Electronic Medical Record Data-Mining [Dataset]. https://simtk.org/frs/?group_id=892
    Explore at:
    data/images/video(5 MB), application/x-zip-compressed(1 MB), source code(1 MB)Available download formats
    Dataset updated
    Sep 26, 2017
    Dataset provided by
    Stanford
    Authors
    Jonathan Chen
    Description

    EMR data-mining code such as association rules for order recommendations and outcome predictions and order set evaluation



    This project includes the following software/data packages:

    • Order Sets and Topic Models : Application code and support script to reproduce topic model and order set prediction evaluations as published in JAMIA 2016 manuscript.
    • ICU DNR : Data underlying paper: "Reversals and Limitations of High-Intensity, Life-Sustaining Treatments" regarding clinical factors associated with DNR and Comfort Care orders in the ICU
    • Item Association Code PSB 2016

  20. Groceries Market Basket Dataset

    • kaggle.com
    zip
    Updated Jul 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irfan Nasrullah (2019). Groceries Market Basket Dataset [Dataset]. https://www.kaggle.com/irfanasrullah/groceries
    Explore at:
    zip(172098 bytes)Available download formats
    Dataset updated
    Jul 16, 2019
    Authors
    Irfan Nasrullah
    Description

    Context The Groceries Market Basket Dataset, which can be found here. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
    The data is suitable to do data mining for market basket analysis which has multiple variables.

    Acknowledgement Thanks to https://github.com/shubhamjha97/association-rule-mining-apriori
    The data is under course Association rules mining using Apriori algorithm. Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus. Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

    Pre-processing

    The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was created from the unique items in the dataset to integers so that each item corresponded to a unique integer. The entire data was mapped to integers to reduce the storage and computational requirement. A reverse mapping was created from the integers to the item, so that the item names could be written in the final output file.

    Don't forget to upvote before you download :)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
Organization logo

Real Market Data for Association Rules

Unveiling Retail Insights with Apriori and FP-Growth Algorithms

Explore at:
zip(3068 bytes)Available download formats
Dataset updated
Sep 15, 2023
Authors
Ruken Missonnier
Description

1. Introduction

Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

2. Dataset Overview

Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

  • Bread
  • Honey
  • Bacon
  • Toothpaste
  • Banana
  • Apple
  • Hazelnut
  • Cheese
  • Meat
  • Carrot
  • Cucumber
  • Onion
  • Milk
  • Butter
  • ShavingFoam
  • Salt
  • Flour
  • HeavyCream
  • Egg
  • Olive
  • Shampoo
  • Sugar

3. Purpose of the Dataset

The raison d'être of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

4. Data Format

The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

5. Potential Applications

This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

  • Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.
  • Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.
  • Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.
  • Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

6. Analysis Techniques

The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

7. Conclusion

In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.

Search
Clear search
Close search
Google apps
Main menu