41 datasets found

Survey on Association Rule Mining Using "APRIORI" Algorithm
figshare.com
search.datacite.org
pdf
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yogesh Khaladkar; Pramod warale (2016). Survey on Association Rule Mining Using "APRIORI" Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1393101.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1393101.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yogesh Khaladkar; Pramod warale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Association rule is an important technique in data mining.
Datasets used for evaluating the customized version of Apriori algorithm.
plos.figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Datasets used for evaluating the customized version of Apriori algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154493.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A zip archive containing microbial abundance tables which were employed for deciphering association rules using the customised version of the Apriori algorithm. (ZIP)
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Grocery Store dataset for data mining
kaggle.com
zip
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honey Patel (2021). Grocery Store dataset for data mining [Dataset]. https://www.kaggle.com/honeypatel2158/grocery-store-dataset-for-data-mining
Explore at:
zip(7990 bytes)Available download formats
Dataset updated
Mar 9, 2021
Authors
Honey Patel
Description
Dataset

This dataset was created by Honey Patel

Contents
Characteristics that Favor Freq-Itemset Algorithms
kaggle.com
Updated Oct 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Heaton (2020). Characteristics that Favor Freq-Itemset Algorithms [Dataset]. https://www.kaggle.com/jeffheaton/characteristics-that-favor-freqitemset-algorithms
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2020
Dataset provided by
Kaggle
Authors
Jeff Heaton
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Source Paper

This dataset is from my paper:

Heaton, J. (2016, March). Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In SoutheastCon 2016 (pp. 1-7). IEEE.

Frequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.

Files Generated

We generated two datasets that allow us to adjust two independent variables to create a total of 20 different transaction sets. We also provide the Python script that generated this data in a notebook. This Python script accepts the following parameters to specify the transaction set to produce:

Transaction/Basket count: 5 million default

Number of items: 50,000 default

Number of frequent sets: 100 default

Max transaction/basket size: independent variable, 5-100 range

Frequent set density: independent variable, 0.1 to 0.8 range

Files contained in this dataset reside in two folders: * freq-items-pct - We vary the frequent set density in these transaction sets. * freq-items-tsz - We change the maximum number of items per basket in these transaction sets.

While you can vary basket count, the number of frequent sets, and the number of items in the script, they will remain fixed at this paper's above values. We determined that the basket count only had a small positive correlation.

File Content

The following listing shows the type of data generated for this research. Here we present an example file created with ten baskets out of 100 items, two frequent itemsets, a maximum basket size of 10, and a density of 0.5.

I36 I94 I71 I13 I91 I89 I34 F6 F5 F3 F4 I86 I39 I16 I49 I62 I31 I54 I91 I22 I31 I70 I85 I78 I63 F4 F3 F1 F6 F0 I69 I44 I82 I50 I9 I31 I57 I20 F4 F3 F1 F6 F0 I87

As you can see from the above file, the items are either prefixed with “I” or “F.” The “F” prefix indicates that this line contains one of the frequent itemsets. Items with the “I” prefix are not part of an intentional frequent itemset. Of course, “I” prefixed items might form frequent itemsets, as they are uniformly sampled from the number of things to fill out nonfrequent itemsets. Each basket will have a random size chosen, up to the maximum basket size. The frequent itsemset density specifies the probability of each line containing one of the intentional frequent itemsets. Because we used a density of 0.5, approximately half of the lines above include one of the two intentional frequent itemsets. A frequent itemset line may have additional random “I” prefixed items added to cause the line to reach the randomly chosen length for that line. If the frequent itemset selected does cause the generated sequence to exceed its randomly chosen length, no truncation will occur. The intentional frequent itemsets are all determined to be less than or equal to the maximum basket size.
Real Market Data for Association Rules
kaggle.com
zip
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
Explore at:
zip(3068 bytes)Available download formats
Dataset updated
Sep 15, 2023
Authors
Ruken Missonnier
Description
1. Introduction

Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

2. Dataset Overview

Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

Bread

Honey

Bacon

Toothpaste

Banana

Apple

Hazelnut

Cheese

Meat

Carrot

Cucumber

Onion

Milk

Butter

ShavingFoam

Salt

Flour

HeavyCream

Egg

Olive

Shampoo

Sugar

3. Purpose of the Dataset

The raison d'être of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

4. Data Format

The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

5. Potential Applications

This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.

Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.

Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.

Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

6. Analysis Techniques

The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

7. Conclusion

In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.
MOESM2 of Data mining combined to the multicriteria decision analysis for...
springernature.figshare.com
figshare.com
application/cdfv2
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM2 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660082.v1
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7660082.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2. The integral table of transactions T.
Groceries Purchase Analysis Dataset
kaggle.com
zip
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeel Gajera (2023). Groceries Purchase Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/earthian/grocery-dataset/code
Explore at:
zip(180519 bytes)Available download formats
Dataset updated
May 11, 2023
Authors
Jeel Gajera
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains transactional data of grocery purchases. Each row represents a transaction where items purchased are listed. The items are categorized into columns, with each column representing a specific product. If an item is present in a transaction, it is denoted by a '1'; otherwise, it is denoted by '0'. The dataset is suitable for analyzing frequent itemsets using the Apriori algorithm, a popular method in market basket analysis and association rule mining.
Apriori algorithm-based association rules.
plos.figshare.com
bin
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Apriori algorithm-based association rules. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289749.t001
Dataset updated
Aug 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, the prevalence of T2DM has been increasing annually, in particular, the personal and socioeconomic burden caused by multiple complications has become increasingly serious. This study aimed to screen out the high-risk complication combination of T2DM through various data mining methods, establish and evaluate a risk prediction model of the complication combination in patients with T2DM. Questionnaire surveys, physical examinations, and biochemical tests were conducted on 4,937 patients with T2DM, and 810 cases of sample data with complications were retained. The high-risk complication combination was screened by association rules based on the Apriori algorithm. Risk factors were screened using the LASSO regression model, random forest model, and support vector machine. A risk prediction model was established using logistic regression analysis, and a dynamic nomogram was constructed. Receiver operating characteristic (ROC) curves, harrell’s concordance index (C-Index), calibration curves, decision curve analysis (DCA), and internal validation were used to evaluate the differentiation, calibration, and clinical applicability of the models. This study found that patients with T2DM had a high-risk combination of lower extremity vasculopathy, diabetic foot, and diabetic retinopathy. Based on this, body mass index, diastolic blood pressure, total cholesterol, triglyceride, 2-hour postprandial blood glucose and blood urea nitrogen levels were screened and used for the modeling analysis. The area under the ROC curves of the internal and external validations were 0.768 (95% CI, 0.744−0.792) and 0.745 (95% CI, 0.669−0.820), respectively, and the C-index and AUC value were consistent. The calibration plots showed good calibration, and the risk threshold for DCA was 30–54%. In this study, we developed and evaluated a predictive model for the development of a high-risk complication combination while uncovering the pattern of complications in patients with T2DM. This model has a practical guiding effect on the health management of patients with T2DM in community settings.
MOESM3 of Data mining combined to the multicriteria decision analysis for...
springernature.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM3 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660091.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7660091.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
Additional file 3. The integral matrix of concordance indices.
Number of association rules generated using the Apriori rule mining approach...
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated using the Apriori rule mining approach with various datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154493.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summarised information pertaining to (a) the number of samples, (b) the number of generated association rules (total as well as rules that involve 3 or more genera), (c) the unique number of microbial genera involved in the identified association rules, (d) execution time, and (e) the number of rules generated using an alternative rule mining strategy (detailed in discussion section of the manuscript).
Groceries dataset
kaggle.com
zip
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
Explore at:
zip(263057 bytes)Available download formats
Dataset updated
Sep 17, 2020
Authors
Heeral Dedhia
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.
Groceries Market Basket Dataset
kaggle.com
zip
Updated Jul 16, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irfan Nasrullah (2019). Groceries Market Basket Dataset [Dataset]. https://www.kaggle.com/irfanasrullah/groceries
Explore at:
zip(172098 bytes)Available download formats
Dataset updated
Jul 16, 2019
Authors
Irfan Nasrullah
Description
Context The Groceries Market Basket Dataset, which can be found here. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
The data is suitable to do data mining for market basket analysis which has multiple variables.

Acknowledgement Thanks to https://github.com/shubhamjha97/association-rule-mining-apriori
The data is under course Association rules mining using Apriori algorithm. Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus. Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

Pre-processing

The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was created from the unique items in the dataset to integers so that each item corresponded to a unique integer. The entire data was mapped to integers to reduce the storage and computational requirement. A reverse mapping was created from the integers to the item, so that the item names could be written in the final output file.

Don't forget to upvote before you download :)
Table1_Natural products for migraine: Data-mining analyses of Chinese...
frontiersin.figshare.com
docx
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claire Shuiqing Zhang; Shaohua Lyu; Anthony Lin Zhang; Xinfeng Guo; Jingbo Sun; Chuanjian Lu; Xiaodong Luo; Charlie Changli Xue (2023). Table1_Natural products for migraine: Data-mining analyses of Chinese Medicine classical literature.DOCX [Dataset]. http://doi.org/10.3389/fphar.2022.995559.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2022.995559.s001
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Claire Shuiqing Zhang; Shaohua Lyu; Anthony Lin Zhang; Xinfeng Guo; Jingbo Sun; Chuanjian Lu; Xiaodong Luo; Charlie Changli Xue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Treatment effect of current pharmacotherapies for migraine is unsatisfying. Discovering new anti-migraine natural products and nutraceuticals from large collections of Chinese medicine classical literature may assist to address this gap.Methods: We conducted a comprehensive search in the Encyclopedia of Traditional Chinese Medicine (version 5.0) to obtain migraine-related citations, then screened and scored these citations to identify clinical management of migraine using oral herbal medicine in history. Information of formulae, herbs and symptoms were further extracted. After standardisation, these data were analysed using frequency analysis and the Apriori algorithm. Anti-migraine effects and mechanisms of actions of the main herbs and formula were summarised.Results: Among 614 eligible citations, the most frequently used formula was chuan xiong cha tiao san (CXCTS), and the most frequently used herb was chuan xiong. Dietary medicinal herbs including gan cao, bai zhi, bo he, tian ma and sheng jiang were identified. Strong associations were constructed among the herb ingredients of CXCTS formula. Symptoms of chronic duration and unilateral headache were closely related with herbs of chuan xiong, gan cao, fang feng, qiang huo and cha. Symptoms of vomiting and nausea were specifically related to herbs of sheng jiang and ban xia.Conclusion: The herb ingredients of CXCTS which presented anti-migraine effects with reliable evidence of anti-migraine actions can be selected as potential drug discovery candidates, while dietary medicinal herbs including sheng jiang, bo he, cha, bai zhi, tian ma, and gan cao can be further explored as nutraceuticals for migraine.
Market Basket Optimization
kaggle.com
zip
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aly El-badry (2025). Market Basket Optimization [Dataset]. https://www.kaggle.com/datasets/alyelbadry/market-basket-optimization/code
Explore at:
zip(47991 bytes)Available download formats
Dataset updated
Jan 28, 2025
Authors
Aly El-badry
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains transactional data collected for market basket analysis. Each row represents a single transaction with items purchased together. It is ideal for implementing association rule mining techniques such as Apriori, FP-Growth, and other machine learning algorithms.

Key Features:

Transactions: Lists of items purchased together in a single transaction.

Applications: Perfect for studying customer purchase patterns, building recommendation systems, and identifying frequent item sets.

Usage: Use this dataset to practice generating actionable insights for retailers and e-commerce platforms.

Format:

Rows: Each row represents a transaction.

Columns: Each column corresponds to an item in the transaction.

Examples of Potential Use Cases:

Find combinations of items frequently purchased together.

Predict the likelihood of items being bought together.

Build AI-powered marketing strategies based on association rules.

Credits:

This dataset is formatted for educational and research purposes. Feel free to use it to explore and enhance your skills in data mining and machine learning!
DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang (2023). DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based Formulas: A Nationwide Descriptive Study With Market Basket Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.641530.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.641530.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: As time evolved, traditional Chinese medicine (TCM) became integrated into the global medical system as complementary treatments. Some essential TCM herbs started to play a limited role in clinical practices because of Western medication development. For example, Fuzi (Aconiti Lateralis Radix Praeparata) is a toxic but indispensable TCM herb. Fuzi was mainly used in poor circulation and life-threatening conditions by history records. However, with various Western medication options for treating critical conditions currently, how is Fuzi used clinically and its indications in modern TCM are unclear. This study aimed to evaluate Fuzi and Fuzi-based formulas in modern clinical practices using artificial intelligence and data mining methods.Methods: This nationwide descriptive study with market basket analysis used a cohort selected from the Taiwan National Health Insurance database that contained one million national representatives between 2003 and 2010 used for our analysis. Descriptive statistics were performed to demonstrate the modern clinical indications of Fuzi. Market basket analysis was calculated by the Apriori algorithm to discover the association rules between Fuzi and other TCM herbs.Results: A total of 104,281 patients using 405,837 prescriptions of Fuzi and Fuzi-based formulas were identified. TCM doctors were found to use Fuzi in pulmonary (21.5%), gastrointestinal (17.3%), and rheumatologic (11.0%) diseases, but not commonly in cardiovascular diseases (7.4%). Long-term users of Fuzi and Fuzi-based formulas often had the following comorbidities diagnosed by Western doctors: osteoarthritis (31.0%), peptic ulcers (29.5%), hypertension (19.9%), and COPD (19.7%). Patients also used concurrent medications such as H2-receptor antagonists, nonsteroidal anti-inflammatory drugs, β-blockers, calcium channel blockers, and aspirin. Through market basket analysis, for the first time, we noticed many practical Fuzi-related herbal pairs such as Fuzi–Hsihsin (Asari Radix et Rhizoma)–Dahuang (Rhei Radix et Rhizoma) for neurologic diseases and headache.Conclusion: For the first time, big data analysis was applied to uncover the modern clinical indications of Fuzi in addition to traditional use. We provided necessary evidence on the scientific use of Fuzi in current TCM practices, and the Fuzi-related herbal pairs discovered in this study are helpful to the development of new botanical drugs.
The result comparison of the different D.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The result comparison of the different D. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The result comparison of the different D.
m
Data for: Mining multiple association rules in LTPP database: an analysis of...
data.mendeley.com
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiwen Hao (2018). Data for: Mining multiple association rules in LTPP database: an analysis of asphalt pavement thermal cracking distress [Dataset]. http://doi.org/10.17632/w94jndtmpr.1
Explore at:
Unique identifier
https://doi.org/10.17632/w94jndtmpr.1
Dataset updated
Oct 16, 2018
Authors
Peiwen Hao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
MATLAB Codes and original data for Apriori
The SAR difference of different confidence degree thresholds in D = 3.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The SAR difference of different confidence degree thresholds in D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t009
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SAR difference of different confidence degree thresholds in D = 3.
Grocery Store Data Set
kaggle.com
zip
Updated Nov 8, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shazad Udwadia (2016). Grocery Store Data Set [Dataset]. https://www.kaggle.com/shazadudwadia/supermarket
Explore at:
zip(323 bytes)Available download formats
Dataset updated
Nov 8, 2016
Authors
Shazad Udwadia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
For my Data Mining lab where we had to execute algorithms like apriori, it was very difficult to get a small data set with only a few transactions. It was infeasible to run the algorithm with datasets containing over 10000 transactions. This dataset contains 11 items : JAM, MAGGI, SUGAR, COFFEE, CHEESE, TEA, BOURNVITA, CORNFLAKES, BREAD, BISCUIT and MILK.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yogesh Khaladkar; Pramod warale (2016). Survey on Association Rule Mining Using "APRIORI" Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.1393101.v1

Survey on Association Rule Mining Using "APRIORI" Algorithm

Explore at:

19 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1393101.v1

Dataset updated

Jan 19, 2016

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Yogesh Khaladkar; Pramod warale

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Association rule is an important technique in data mining.

Clear search

Close search

Google apps

Main menu

Survey on Association Rule Mining Using "APRIORI" Algorithm

Datasets used for evaluating the customized version of Apriori algorithm.

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Grocery Store dataset for data mining

Dataset

Contents

Characteristics that Favor Freq-Itemset Algorithms

Source Paper

Files Generated

File Content

Real Market Data for Association Rules

1. Introduction

2. Dataset Overview

3. Purpose of the Dataset

4. Data Format

5. Potential Applications

6. Analysis Techniques

7. Conclusion

MOESM2 of Data mining combined to the multicriteria decision analysis for...

Groceries Purchase Analysis Dataset

Apriori algorithm-based association rules.

MOESM3 of Data mining combined to the multicriteria decision analysis for...

Number of association rules generated using the Apriori rule mining approach...

Groceries dataset

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms:

Groceries Market Basket Dataset

Table1_Natural products for migraine: Data-mining analyses of Chinese...

Market Basket Optimization

Key Features:

Format:

Examples of Potential Use Cases:

Credits:

DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...

The result comparison of the different D.

Data for: Mining multiple association rules in LTPP database: an analysis of...

The SAR difference of different confidence degree thresholds in D = 3.

Grocery Store Data Set

Survey on Association Rule Mining Using "APRIORI" Algorithm