8 datasets found

Dataset for Apriori Algorithm - Frequent Itemsets
kaggle.com
zip
Updated Feb 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akalya Subramanian (2021). Dataset for Apriori Algorithm - Frequent Itemsets [Dataset]. https://www.kaggle.com/akalyasubramanian/dataset-for-apriori-algorithm-frequent-itemsets
Explore at:
zip(47959 bytes)Available download formats
Dataset updated
Feb 6, 2021
Authors
Akalya Subramanian
Description
Dataset

This dataset was created by Akalya Subramanian

Contents
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Groceries Purchase Analysis Dataset
kaggle.com
zip
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeel Gajera (2023). Groceries Purchase Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/earthian/grocery-dataset/code
Explore at:
zip(180519 bytes)Available download formats
Dataset updated
May 11, 2023
Authors
Jeel Gajera
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains transactional data of grocery purchases. Each row represents a transaction where items purchased are listed. The items are categorized into columns, with each column representing a specific product. If an item is present in a transaction, it is denoted by a '1'; otherwise, it is denoted by '0'. The dataset is suitable for analyzing frequent itemsets using the Apriori algorithm, a popular method in market basket analysis and association rule mining.
Market Basket Analysis Data
kaggle.com
zip
Updated May 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AHMET BAŞ (2021). Market Basket Analysis Data [Dataset]. https://www.kaggle.com/ahmtcnbs/datasets-for-appiori
Explore at:
zip(8317 bytes)Available download formats
Dataset updated
May 28, 2021
Authors
AHMET BAŞ
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Market Basket Analysis Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.
Characteristics that Favor Freq-Itemset Algorithms
kaggle.com
Updated Oct 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Heaton (2020). Characteristics that Favor Freq-Itemset Algorithms [Dataset]. https://www.kaggle.com/jeffheaton/characteristics-that-favor-freqitemset-algorithms
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2020
Dataset provided by
Kaggle
Authors
Jeff Heaton
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Source Paper

This dataset is from my paper:

Heaton, J. (2016, March). Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In SoutheastCon 2016 (pp. 1-7). IEEE.

Frequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.

Files Generated

We generated two datasets that allow us to adjust two independent variables to create a total of 20 different transaction sets. We also provide the Python script that generated this data in a notebook. This Python script accepts the following parameters to specify the transaction set to produce:

Transaction/Basket count: 5 million default

Number of items: 50,000 default

Number of frequent sets: 100 default

Max transaction/basket size: independent variable, 5-100 range

Frequent set density: independent variable, 0.1 to 0.8 range

Files contained in this dataset reside in two folders: * freq-items-pct - We vary the frequent set density in these transaction sets. * freq-items-tsz - We change the maximum number of items per basket in these transaction sets.

While you can vary basket count, the number of frequent sets, and the number of items in the script, they will remain fixed at this paper's above values. We determined that the basket count only had a small positive correlation.

File Content

The following listing shows the type of data generated for this research. Here we present an example file created with ten baskets out of 100 items, two frequent itemsets, a maximum basket size of 10, and a density of 0.5.

I36 I94 I71 I13 I91 I89 I34 F6 F5 F3 F4 I86 I39 I16 I49 I62 I31 I54 I91 I22 I31 I70 I85 I78 I63 F4 F3 F1 F6 F0 I69 I44 I82 I50 I9 I31 I57 I20 F4 F3 F1 F6 F0 I87

As you can see from the above file, the items are either prefixed with “I” or “F.” The “F” prefix indicates that this line contains one of the frequent itemsets. Items with the “I” prefix are not part of an intentional frequent itemset. Of course, “I” prefixed items might form frequent itemsets, as they are uniformly sampled from the number of things to fill out nonfrequent itemsets. Each basket will have a random size chosen, up to the maximum basket size. The frequent itsemset density specifies the probability of each line containing one of the intentional frequent itemsets. Because we used a density of 0.5, approximately half of the lines above include one of the two intentional frequent itemsets. A frequent itemset line may have additional random “I” prefixed items added to cause the line to reach the randomly chosen length for that line. If the frequent itemset selected does cause the generated sequence to exceed its randomly chosen length, no truncation will occur. The intentional frequent itemsets are all determined to be less than or equal to the maximum basket size.
Groceries dataset
kaggle.com
zip
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
Explore at:
zip(263057 bytes)Available download formats
Dataset updated
Sep 17, 2020
Authors
Heeral Dedhia
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.
Frequent 3-itemset.
plos.figshare.com
xls
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongkang Ding (2025). Frequent 3-itemset. [Dataset]. http://doi.org/10.1371/journal.pone.0325925.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325925.t006
Dataset updated
Jul 29, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Yongkang Ding
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Physical fitness refers to the health of all body functions, including cardiorespiratory endurance, muscle strength, flexibility, stamina, and body composition, which can help individuals effectively cope with daily activities and sports challenges. This paper explores the physical characteristics of basketball players, aiming to improve training effects through unique physical evaluation indicators and provide a theoretical framework for improving college basketball performance and training standards. The study adopted the Apriori association rule algorithm in data mining. First, the physical data of basketball players were collected and preprocessed. Then, frequent item sets were extracted through the association rule mining algorithm, association rules were generated, and the key factors affecting the physical performance of athletes were analyzed. The article’s results revealed the potential relationship between different physical characteristics and emphasized the application prospects of association rule mining in the physical evaluation of basketball players.
Real Market Data for Association Rules
kaggle.com
zip
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruken Missonnier (2023). Real Market Data for Association Rules [Dataset]. https://www.kaggle.com/datasets/rukenmissonnier/real-market-data
Explore at:
zip(3068 bytes)Available download formats
Dataset updated
Sep 15, 2023
Authors
Ruken Missonnier
Description
1. Introduction

Within the confines of this document, we embark on a comprehensive journey delving into the intricacies of a dataset meticulously curated for the purpose of association rules mining. This sophisticated data mining technique is a linchpin in the realms of market basket analysis. The dataset in question boasts an array of items commonly found in retail transactions, each meticulously encoded as a binary variable, with "1" denoting presence and "0" indicating absence in individual transactions.

2. Dataset Overview

Our dataset unfolds as an opulent tapestry of distinct columns, each dedicated to the representation of a specific item:

Bread

Honey

Bacon

Toothpaste

Banana

Apple

Hazelnut

Cheese

Meat

Carrot

Cucumber

Onion

Milk

Butter

ShavingFoam

Salt

Flour

HeavyCream

Egg

Olive

Shampoo

Sugar

3. Purpose of the Dataset

The raison d'être of this dataset is to serve as a catalyst for the discovery of intricate associations and patterns concealed within the labyrinthine network of customer transactions. Each row in this dataset mirrors a solitary transaction, while the values within each column serve as sentinels, indicating whether a particular item was welcomed into a transaction's embrace or relegated to the periphery.

4. Data Format

The data within this repository is rendered in a binary symphony, where the enigmatic "1" enunciates the acquisition of an item, and the stoic "0" signifies its conspicuous absence. This binary manifestation serves to distill the essence of the dataset, centering the focus on item presence, rather than the quantum thereof.

5. Potential Applications

This dataset unfurls its wings to encompass an assortment of prospective applications, including but not limited to:

Market Basket Analysis: Discerning items that waltz together in shopping carts, thus bestowing enlightenment upon the orchestration of product placement and marketing strategies.

Recommender Systems: Crafting bespoke product recommendations, meticulously tailored to each customer's historical transactional symphony.

Inventory Management: Masterfully fine-tuning stock levels for items that find kinship in frequent co-acquisition, thereby orchestrating a harmonious reduction in carrying costs and stockouts.

Customer Behavior Analysis: Peering into the depths of customer proclivities and purchase patterns, paving the way for the sculpting of exquisite marketing campaigns.

6. Analysis Techniques

The treasure trove of this dataset beckons the deployment of quintessential techniques, among them the venerable Apriori and FP-Growth algorithms. These stalwart algorithms are proficient at ferreting out the elusive frequent itemsets and invaluable association rules, shedding light on the arcane symphony of customer behavior and item co-occurrence patterns.

7. Conclusion

In closing, the association rules dataset unfurled before you offers an alluring odyssey, replete with the promise of discovering priceless patterns and affiliations concealed within the tapestry of transactional data. Through the artistry of data mining algorithms, businesses and analysts stand poised to unearth hitherto latent insights capable of steering the helm of strategic decisions, elevating the pantheon of customer experiences, and orchestrating the symphony of operational optimization.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Akalya Subramanian (2021). Dataset for Apriori Algorithm - Frequent Itemsets [Dataset]. https://www.kaggle.com/akalyasubramanian/dataset-for-apriori-algorithm-frequent-itemsets

Dataset for Apriori Algorithm - Frequent Itemsets

Explore at:

zip(47959 bytes)Available download formats

Dataset updated

Feb 6, 2021

Authors

Akalya Subramanian

Description

Dataset

This dataset was created by Akalya Subramanian

Clear search

Close search

Google apps

Main menu

Dataset for Apriori Algorithm - Frequent Itemsets

Dataset

Contents

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Groceries Purchase Analysis Dataset

Market Basket Analysis Data

Characteristics that Favor Freq-Itemset Algorithms

Source Paper

Files Generated

File Content

Groceries dataset

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms:

Frequent 3-itemset.

Real Market Data for Association Rules

1. Introduction

2. Dataset Overview

3. Purpose of the Dataset

4. Data Format

5. Potential Applications

6. Analysis Techniques

7. Conclusion

Dataset for Apriori Algorithm - Frequent ItemsetsSee More Versions

Dataset

Contents

Dataset for Apriori Algorithm - Frequent Itemsets