41 datasets found
  1. Survey on Association Rule Mining Using "APRIORI" Algorithm

    • figshare.com
    • search.datacite.org
    pdf
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogesh Khaladkar; Pramod warale (2016). Survey on Association Rule Mining Using "APRIORI" Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1393101.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yogesh Khaladkar; Pramod warale
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Association rule is an important technique in data mining.

  2. Datasets used for evaluating the customized version of Apriori algorithm.

    • plos.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Datasets used for evaluating the customized version of Apriori algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A zip archive containing microbial abundance tables which were employed for deciphering association rules using the customised version of the Apriori algorithm. (ZIP)

  3. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  4. Grocery Store dataset for data mining

    • kaggle.com
    zip
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honey Patel (2021). Grocery Store dataset for data mining [Dataset]. https://www.kaggle.com/honeypatel2158/grocery-store-dataset-for-data-mining
    Explore at:
    zip(7990 bytes)Available download formats
    Dataset updated
    Mar 9, 2021
    Authors
    Honey Patel
    Description

    Dataset

    This dataset was created by Honey Patel

    Contents

  5. Characteristics that Favor Freq-Itemset Algorithms

    • kaggle.com
    Updated Oct 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Heaton (2020). Characteristics that Favor Freq-Itemset Algorithms [Dataset]. https://www.kaggle.com/jeffheaton/characteristics-that-favor-freqitemset-algorithms
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 24, 2020
    Dataset provided by
    Kaggle
    Authors
    Jeff Heaton
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Source Paper

    This dataset is from my paper:

    Heaton, J. (2016, March). Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In SoutheastCon 2016 (pp. 1-7). IEEE.

    Frequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.

    Files Generated

    We generated two datasets that allow us to adjust two independent variables to create a total of 20 different transaction sets. We also provide the Python script that generated this data in a notebook. This Python script accepts the following parameters to specify the transaction set to produce:

    • Transaction/Basket count: 5 million default
    • Number of items: 50,000 default
    • Number of frequent sets: 100 default
    • Max transaction/basket size: independent variable, 5-100 range
    • Frequent set density: independent variable, 0.1 to 0.8 range

    Files contained in this dataset reside in two folders: * freq-items-pct - We vary the frequent set density in these transaction sets. * freq-items-tsz - We change the maximum number of items per basket in these transaction sets.

    While you can vary basket count, the number of frequent sets, and the number of items in the script, they will remain fixed at this paper's above values. We determined that the basket count only had a small positive correlation.

    File Content

    The following listing shows the type of data generated for this research. Here we present an example file created with ten baskets out of 100 items, two frequent itemsets, a maximum basket size of 10, and a density of 0.5.

    I36 I94 
    I71 I13 I91 I89 I34
    F6 F5 F3 F4 
    I86 
    I39 I16 I49 I62 I31 I54 I91 
    I22 I31 
    I70 I85 I78 I63 
    F4 F3 F1 F6 F0 I69 I44 
    I82 I50 I9 I31 I57 I20 
    F4 F3 F1 F6 F0 I87
    

    As you can see from the above file, the items are either prefixed with “I” or “F.” The “F” prefix indicates that this line contains one of the frequent itemsets. Items with the “I” prefix are not part of an intentional frequent itemset. Of course, “I” prefixed items might form frequent itemsets, as they are uniformly sampled from the number of things to fill out nonfrequent itemsets. Each basket will have a random size chosen, up to the maximum basket size. The frequent itsemset density specifies the probability of each line containing one of the intentional frequent itemsets. Because we used a density of 0.5, approximately half of the lines above include one of the two intentional frequent itemsets. A frequent itemset line may have additional random “I” prefixed items added to cause the line to reach the randomly chosen length for that line. If the frequent itemset selected does cause the generated sequence to exceed its randomly chosen length, no truncation will occur. The intentional frequent itemsets are all determined to be less than or equal to the maximum basket size.

  6. Real Market Data for Association Rules

    • kaggle.com
    zip
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
    Explore at:
    zip(3068 bytes)Available download formats
    Dataset updated
    Sep 15, 2023
    Authors
    Ruken Missonnier
    Description

    1. Introduction

    Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

    2. Dataset Overview

    Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

    • Bread
    • Honey
    • Bacon
    • Toothpaste
    • Banana
    • Apple
    • Hazelnut
    • Cheese
    • Meat
    • Carrot
    • Cucumber
    • Onion
    • Milk
    • Butter
    • ShavingFoam
    • Salt
    • Flour
    • HeavyCream
    • Egg
    • Olive
    • Shampoo
    • Sugar

    3. Purpose of the Dataset

    The raison d'être of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

    4. Data Format

    The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

    5. Potential Applications

    This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

    • Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.
    • Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.
    • Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.
    • Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

    6. Analysis Techniques

    The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

    7. Conclusion

    In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.

  7. MOESM2 of Data mining combined to the multicriteria decision analysis for...

    • springernature.figshare.com
    • figshare.com
    application/cdfv2
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM2 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660082.v1
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2. The integral table of transactions T.

  8. Groceries Purchase Analysis Dataset

    • kaggle.com
    zip
    Updated May 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeel Gajera (2023). Groceries Purchase Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/earthian/grocery-dataset/code
    Explore at:
    zip(180519 bytes)Available download formats
    Dataset updated
    May 11, 2023
    Authors
    Jeel Gajera
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains transactional data of grocery purchases. Each row represents a transaction where items purchased are listed. The items are categorized into columns, with each column representing a specific product. If an item is present in a transaction, it is denoted by a '1'; otherwise, it is denoted by '0'. The dataset is suitable for analyzing frequent itemsets using the Apriori algorithm, a popular method in market basket analysis and association rule mining.

  9. Apriori algorithm-based association rules.

    • plos.figshare.com
    bin
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Apriori algorithm-based association rules. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t001
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, the prevalence of T2DM has been increasing annually, in particular, the personal and socioeconomic burden caused by multiple complications has become increasingly serious. This study aimed to screen out the high-risk complication combination of T2DM through various data mining methods, establish and evaluate a risk prediction model of the complication combination in patients with T2DM. Questionnaire surveys, physical examinations, and biochemical tests were conducted on 4,937 patients with T2DM, and 810 cases of sample data with complications were retained. The high-risk complication combination was screened by association rules based on the Apriori algorithm. Risk factors were screened using the LASSO regression model, random forest model, and support vector machine. A risk prediction model was established using logistic regression analysis, and a dynamic nomogram was constructed. Receiver operating characteristic (ROC) curves, harrell’s concordance index (C-Index), calibration curves, decision curve analysis (DCA), and internal validation were used to evaluate the differentiation, calibration, and clinical applicability of the models. This study found that patients with T2DM had a high-risk combination of lower extremity vasculopathy, diabetic foot, and diabetic retinopathy. Based on this, body mass index, diastolic blood pressure, total cholesterol, triglyceride, 2-hour postprandial blood glucose and blood urea nitrogen levels were screened and used for the modeling analysis. The area under the ROC curves of the internal and external validations were 0.768 (95% CI, 0.744−0.792) and 0.745 (95% CI, 0.669−0.820), respectively, and the C-index and AUC value were consistent. The calibration plots showed good calibration, and the risk threshold for DCA was 30–54%. In this study, we developed and evaluated a predictive model for the development of a high-risk complication combination while uncovering the pattern of complications in patients with T2DM. This model has a practical guiding effect on the health management of patients with T2DM in community settings.

  10. MOESM3 of Data mining combined to the multicriteria decision analysis for...

    • springernature.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM3 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660091.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France
    Description

    Additional file 3. The integral matrix of concordance indices.

  11. Number of association rules generated using the Apriori rule mining approach...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated using the Apriori rule mining approach with various datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summarised information pertaining to (a) the number of samples, (b) the number of generated association rules (total as well as rules that involve 3 or more genera), (c) the unique number of microbial genera involved in the identified association rules, (d) execution time, and (e) the number of rules generated using an alternative rule mining strategy (detailed in discussion section of the manuscript).

  12. Groceries dataset

    • kaggle.com
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
    Explore at:
    zip(263057 bytes)Available download formats
    Dataset updated
    Sep 17, 2020
    Authors
    Heeral Dedhia
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

  13. Groceries Market Basket Dataset

    • kaggle.com
    zip
    Updated Jul 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irfan Nasrullah (2019). Groceries Market Basket Dataset [Dataset]. https://www.kaggle.com/irfanasrullah/groceries
    Explore at:
    zip(172098 bytes)Available download formats
    Dataset updated
    Jul 16, 2019
    Authors
    Irfan Nasrullah
    Description

    Context The Groceries Market Basket Dataset, which can be found here. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
    The data is suitable to do data mining for market basket analysis which has multiple variables.

    Acknowledgement Thanks to https://github.com/shubhamjha97/association-rule-mining-apriori
    The data is under course Association rules mining using Apriori algorithm. Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus. Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

    Pre-processing

    The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was created from the unique items in the dataset to integers so that each item corresponded to a unique integer. The entire data was mapped to integers to reduce the storage and computational requirement. A reverse mapping was created from the integers to the item, so that the item names could be written in the final output file.

    Don't forget to upvote before you download :)

  14. Table1_Natural products for migraine: Data-mining analyses of Chinese...

    • frontiersin.figshare.com
    docx
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claire Shuiqing Zhang; Shaohua Lyu; Anthony Lin Zhang; Xinfeng Guo; Jingbo Sun; Chuanjian Lu; Xiaodong Luo; Charlie Changli Xue (2023). Table1_Natural products for migraine: Data-mining analyses of Chinese Medicine classical literature.DOCX [Dataset]. http://doi.org/10.3389/fphar.2022.995559.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Claire Shuiqing Zhang; Shaohua Lyu; Anthony Lin Zhang; Xinfeng Guo; Jingbo Sun; Chuanjian Lu; Xiaodong Luo; Charlie Changli Xue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Treatment effect of current pharmacotherapies for migraine is unsatisfying. Discovering new anti-migraine natural products and nutraceuticals from large collections of Chinese medicine classical literature may assist to address this gap.Methods: We conducted a comprehensive search in the Encyclopedia of Traditional Chinese Medicine (version 5.0) to obtain migraine-related citations, then screened and scored these citations to identify clinical management of migraine using oral herbal medicine in history. Information of formulae, herbs and symptoms were further extracted. After standardisation, these data were analysed using frequency analysis and the Apriori algorithm. Anti-migraine effects and mechanisms of actions of the main herbs and formula were summarised.Results: Among 614 eligible citations, the most frequently used formula was chuan xiong cha tiao san (CXCTS), and the most frequently used herb was chuan xiong. Dietary medicinal herbs including gan cao, bai zhi, bo he, tian ma and sheng jiang were identified. Strong associations were constructed among the herb ingredients of CXCTS formula. Symptoms of chronic duration and unilateral headache were closely related with herbs of chuan xiong, gan cao, fang feng, qiang huo and cha. Symptoms of vomiting and nausea were specifically related to herbs of sheng jiang and ban xia.Conclusion: The herb ingredients of CXCTS which presented anti-migraine effects with reliable evidence of anti-migraine actions can be selected as potential drug discovery candidates, while dietary medicinal herbs including sheng jiang, bo he, cha, bai zhi, tian ma, and gan cao can be further explored as nutraceuticals for migraine.

  15. Market Basket Optimization

    • kaggle.com
    zip
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aly El-badry (2025). Market Basket Optimization [Dataset]. https://www.kaggle.com/datasets/alyelbadry/market-basket-optimization/code
    Explore at:
    zip(47991 bytes)Available download formats
    Dataset updated
    Jan 28, 2025
    Authors
    Aly El-badry
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains transactional data collected for market basket analysis. Each row represents a single transaction with items purchased together. It is ideal for implementing association rule mining techniques such as Apriori, FP-Growth, and other machine learning algorithms.

    Key Features:

    • Transactions: Lists of items purchased together in a single transaction.
    • Applications: Perfect for studying customer purchase patterns, building recommendation systems, and identifying frequent item sets.
    • Usage: Use this dataset to practice generating actionable insights for retailers and e-commerce platforms.

    Format:

    • Rows: Each row represents a transaction.
    • Columns: Each column corresponds to an item in the transaction.

    Examples of Potential Use Cases:

    • Find combinations of items frequently purchased together.
    • Predict the likelihood of items being bought together.
    • Build AI-powered marketing strategies based on association rules.

    Credits:

    • This dataset is formatted for educational and research purposes. Feel free to use it to explore and enhance your skills in data mining and machine learning!
  16. DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang (2023). DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based Formulas: A Nationwide Descriptive Study With Market Basket Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.641530.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: As time evolved, traditional Chinese medicine (TCM) became integrated into the global medical system as complementary treatments. Some essential TCM herbs started to play a limited role in clinical practices because of Western medication development. For example, Fuzi (Aconiti Lateralis Radix Praeparata) is a toxic but indispensable TCM herb. Fuzi was mainly used in poor circulation and life-threatening conditions by history records. However, with various Western medication options for treating critical conditions currently, how is Fuzi used clinically and its indications in modern TCM are unclear. This study aimed to evaluate Fuzi and Fuzi-based formulas in modern clinical practices using artificial intelligence and data mining methods.Methods: This nationwide descriptive study with market basket analysis used a cohort selected from the Taiwan National Health Insurance database that contained one million national representatives between 2003 and 2010 used for our analysis. Descriptive statistics were performed to demonstrate the modern clinical indications of Fuzi. Market basket analysis was calculated by the Apriori algorithm to discover the association rules between Fuzi and other TCM herbs.Results: A total of 104,281 patients using 405,837 prescriptions of Fuzi and Fuzi-based formulas were identified. TCM doctors were found to use Fuzi in pulmonary (21.5%), gastrointestinal (17.3%), and rheumatologic (11.0%) diseases, but not commonly in cardiovascular diseases (7.4%). Long-term users of Fuzi and Fuzi-based formulas often had the following comorbidities diagnosed by Western doctors: osteoarthritis (31.0%), peptic ulcers (29.5%), hypertension (19.9%), and COPD (19.7%). Patients also used concurrent medications such as H2-receptor antagonists, nonsteroidal anti-inflammatory drugs, β-blockers, calcium channel blockers, and aspirin. Through market basket analysis, for the first time, we noticed many practical Fuzi-related herbal pairs such as Fuzi–Hsihsin (Asari Radix et Rhizoma)–Dahuang (Rhei Radix et Rhizoma) for neurologic diseases and headache.Conclusion: For the first time, big data analysis was applied to uncover the modern clinical indications of Fuzi in addition to traditional use. We provided necessary evidence on the scientific use of Fuzi in current TCM practices, and the Fuzi-related herbal pairs discovered in this study are helpful to the development of new botanical drugs.

  17. The result comparison of the different D.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The result comparison of the different D. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The result comparison of the different D.

  18. m

    Data for: Mining multiple association rules in LTPP database: an analysis of...

    • data.mendeley.com
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peiwen Hao (2018). Data for: Mining multiple association rules in LTPP database: an analysis of asphalt pavement thermal cracking distress [Dataset]. http://doi.org/10.17632/w94jndtmpr.1
    Explore at:
    Dataset updated
    Oct 16, 2018
    Authors
    Peiwen Hao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MATLAB Codes and original data for Apriori

  19. The SAR difference of different confidence degree thresholds in D = 3.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The SAR difference of different confidence degree thresholds in D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SAR difference of different confidence degree thresholds in D = 3.

  20. Grocery Store Data Set

    • kaggle.com
    zip
    Updated Nov 8, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shazad Udwadia (2016). Grocery Store Data Set [Dataset]. https://www.kaggle.com/shazadudwadia/supermarket
    Explore at:
    zip(323 bytes)Available download formats
    Dataset updated
    Nov 8, 2016
    Authors
    Shazad Udwadia
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    For my Data Mining lab where we had to execute algorithms like apriori, it was very difficult to get a small data set with only a few transactions. It was infeasible to run the algorithm with datasets containing over 10000 transactions. This dataset contains 11 items : JAM, MAGGI, SUGAR, COFFEE, CHEESE, TEA, BOURNVITA, CORNFLAKES, BREAD, BISCUIT and MILK.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yogesh Khaladkar; Pramod warale (2016). Survey on Association Rule Mining Using "APRIORI" Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1393101.v1
Organization logoOrganization logo

Survey on Association Rule Mining Using "APRIORI" Algorithm

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yogesh Khaladkar; Pramod warale
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Association rule is an important technique in data mining.

Search
Clear search
Close search
Google apps
Main menu