7 datasets found

Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Association rule mining data for census tract chemical exposure analysis
catalog.data.gov
data.amerigeoss.org
+1more
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Association rule mining data for census tract chemical exposure analysis [Dataset]. https://catalog.data.gov/dataset/association-rule-mining-data-for-census-tract-chemical-exposure-analysis
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Chemical concentration, exposure, and health risk data for U.S. census tracts from National Scale Air Toxics Assessment (NATA). This dataset is associated with the following publication: Huang, H., R. Tornero-Velez, and T. Barzyk. Associations between socio-demographic characteristics and chemical concentrations contributing to cumulative exposures in the United States. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 27(6): 544-550, (2017).
Complete Data Set - For mining association rules in Indian Stock Market
figshare.com
docx
Updated Nov 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Srinath Mitragotri (2024). Complete Data Set - For mining association rules in Indian Stock Market [Dataset]. http://doi.org/10.6084/m9.figshare.21399549.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21399549.v1
Dataset updated
Nov 3, 2024
Dataset provided by
figshare
Authors
Srinath Mitragotri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is contained in the winrar file - 'DataSet-AssociationMining-India.rar'

Once you open the above winrar file, you will see the below files & folders:

File: "IndiaData-ForAssociationMining.xlsx" is the primary data retrieved from 'Refinitiv-Datastream' which was used in the project.

Folder-1MetricsGT-NSE50 o This folder has MS-Excel macro files used to create return determinant data to be eventually used in the 'Final-Transaction-Table' from which associations would be mined. o This folder also has computed returns for different holding periods for different stocks considered in this study. File: "0_nYrRtnGTNSE50.xlsm" o This folder also has the 'Final-Sheet' used for mining of association rules.

Folder: 2Analysis-GTNSE50 o This folder has the R-program used to mine associations. It also has the final sheets used in association mining for different holding periods. And the output of the association rules mined is also stored here (File name: RulesRHS_1YrRtnGTNSE50.csv and so on)

Folder: 3Validation o This folder has data related to the validation carried out in the project. It has 2 sub-folders: § 1-MetricsForValidation: This folder has excel-macro files to compute the metrics required in the Final-Table for validation of the association rules. § 2-BetaCalc-PortRtns: This folder has the Final transaction sheet which will be later used to compute portfolio beta and portfolio returns for each association rule. This also has the computation of portfolio beta & portfolio returns for each of the 10 association rules analyzed in this paper.

Folder: 4LogitRegression o This folder has the 'R' program used to carry out Logit regression and different model consistency test. It also has the input file for the Logit regression (Filename: India-LogitRegression-csv.csv) o The sub-folder 'Regression_OP' has the output of Logit regression for all association rules for different holding periods.
f
MOESM2 of OmicsARules: a R package for integration of multi-omics datasets...
springernature.figshare.com
xlsx
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danze Chen; Fan Zhang; Qianqian Zhao; Jianzhen Xu (2024). MOESM2 of OmicsARules: a R package for integration of multi-omics datasets via association rules mining [Dataset]. http://doi.org/10.6084/m9.figshare.10278410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.10278410.v1
Dataset updated
Feb 16, 2024
Dataset provided by
figshare
Authors
Danze Chen; Fan Zhang; Qianqian Zhao; Jianzhen Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S1. General information of three real datasets downloaded from TCGA. Table S2. Top 20 rules identified from BRCA mRNA dataset. Table S3. Top 20 rules identified from BRCA DNA methylation. Table S4. Top 20 rules identified from ESCA mRNA dataset. Table S5. Top 20 rules identified from ESCA DNA methylation dataset. Table S6. Top 20 rules identified from LUAD mRNA dataset. Table S7. Top 20 rules identified from LUAD DNA methylation dataset. Table S8. Top 20 rules identified from the combined BRCA mRNA and DNA methylation datasets. Table S9. Top 20 rules identified from the combined ESCA mRNA and DNA methylation datasets. Table S10. Top 20 rules identified from the combined LUAD mRNA and DNA methylation datasets.
Smith_ISL_NMF_V1.0.0
zenodo.org
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert G. Smith; Robert G. Smith (2025). Smith_ISL_NMF_V1.0.0 [Dataset]. http://doi.org/10.5281/zenodo.10639554
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10639554
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robert G. Smith; Robert G. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an dataset of Irish Sign Language (ISL) Non-Manual Feature data.

# Cite:

Robert G. Smith, (2023). Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing. PhD Dissertation. Technological University Dublin. Dublin, Ireland.

Robert G. Smith. (2024). TUD-RSmith/PhD-Appendices: First release - Smith_NMF Dataset V1.0.0 (Smith_NMF_v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.10639554

[![DOI](https://zenodo.org/badge/560578153.svg)](https://zenodo.org/doi/10.5281/zenodo.10639533)

## About
This dataset was published in the appendix of a PhD Dissertation by Robert G. Smith robert.smith@tudublin.ie

Cite: Robert G. Smith, Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing, PhD Dissertation, Technological University Dublin, Ireland, 2023.

The dataset is comprised of several smaller datasets:
### Appendix C
[Appendix C](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixC-most_frequent_lexical_items_in_the_SOI_corpus)
lexical frequency list (see: Smith, R. G. & Hofmann, M., (2020). A Lexical Frequency Analysis of Irish Sign Language. TEANGA, the Journal of the Irish Association for Applied Linguistics, 11, 18–47. https://doi.org/10.35903/teanga.v11i1.162)

### Appendix D
[Appendix D](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixD-all_association_rules)
Association rules. This was the main output of the PhD work. See the dissertation for method. (this dir includes filtered and unfiltered data)

### Appendix E
[Appendix E](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixE-Datasets)
Datasets used to generate association rules

### Appendix F
[Appendix F](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixF-Source_code)
Source code (R) used to generate rules listed in Appendix D

### Appendix G
[Appendix G](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixG-integrity_test)
Source code (R) used for integrity testing
f
Table2_East Asian Herbal Medicine to Reduce Primary Pain and Adverse Events...
frontiersin.figshare.com
docx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hee-Geun Jo; Jihye Seo; Seulki Choi; Donghun Lee (2023). Table2_East Asian Herbal Medicine to Reduce Primary Pain and Adverse Events in Cancer Patients : A Systematic Review and Meta-Analysis With Association Rule Mining to Identify Core Herb Combination.docx [Dataset]. http://doi.org/10.3389/fphar.2021.800571.s011
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.800571.s011
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Hee-Geun Jo; Jihye Seo; Seulki Choi; Donghun Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Objective: Cancer pain is an important factor in cancer management that affects a patient’s quality of life and survival-related outcomes. The aim of this review was to systematically evaluate the efficacy and safety of oral administration of East Asian herbal medicine (EAHM) for primary cancer pain and to explore core herb patterns based on the collected data.Methods: A comprehensive literature search was conducted in 11 electronic databases, namely, PubMed, Cochrane Library, Cumulative Index to Nursing & Allied Health Literature, EMBASE, Korean Studies Information Service System, Research Information Service System, Oriental Medicine Advanced Searching Integrated System, Korea Citation Index, Chinese National Knowledge Infrastructure Database (CNKI), Wanfang Data, and CiNii for randomized controlled trials from their inception until August 19, 2021. Statistical analysis was performed in R version 4.1.1 and R studio program using the default settings of the meta-package. When heterogeneity in studies was detected, the cause was identified through meta-regression and subgroup analysis. Methodological quality was independently assessed using the revised tool for risk of bias in randomized trials (Rob 2.0).Results: A total of 38 trials with 3,434 cancer pain patients met the selection criteria. Meta-analysis favored EAHM-combined conventional medicine on response rate (risk ratio: 1.06; 95% CI: 1.04 to 1.09, p < 0.0001), continuous pain intensity (standardized mean difference: −1.74; 95% CI: −2.17 to −1.30, p < 0.0001), duration of pain relief (standardized mean difference: 0.96, 95% CI: 0.69 to 1.22, p < 0.0001), performance status (weighted mean difference: 10.71; 95% CI: 4.89 to 16.53, p = 0.0003), and opioid usage (weighted mean difference: −20.66 mg/day; 95% CI: −30.22 to −11.10, p < 0.0001). No significant difference was observed between EAHM and conventional medicine on response rate and other outcomes. Patients treated with EAHM had significantly reduced adverse event (AE) incidence rates. In addition, based on the ingredients of herb data in this meta-analysis, four combinations of herb pairs, which were frequently used together for cancer pain, were derived.Conclusion: EAHM monotherapy can decrease adverse events associated with pain management in cancer patients. Additionally, EAHM-combined conventional medicine therapy may be beneficial for patients with cancer pain in increasing the response rate, relieving pain intensity, improving pain-related performance status, and regulating opioid usage. However, the efficacy and safety of EAHM monotherapy are difficult to conclude due to the lack of methodological quality and quantity of studies. More well-designed, multicenter, double-blind, and placebo-controlled randomized clinical trials are needed in the future. In terms of the core herb combination patterns derived from the present review, four combinations of herb pairs might be promising for cancer pain because they have been often distinctly used for cancer patients in East Asia. Thus, they are considered to be worth a follow-up study to elucidate their actions and effects.Systematic Review Registration:https://www.crd.york.ac.uk/prospero/, identifier CRD42021265804
f
Table_1_Knowledge, attitude, and perception regarding COVID-19-related...
frontiersin.figshare.com
docx
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thoa Le; Trang T. B. Le; Le Van Truong; Mai Ngoc Luu; Nguyen Tran Minh Duc; Abdelrahman M. Makram; Truong Van Dat; Nguyen Tien Huy (2023). Table_1_Knowledge, attitude, and perception regarding COVID-19-related prevention practice among residents in Vietnam: a cross-sectional study.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2023.1100335.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2023.1100335.s001
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Thoa Le; Trang T. B. Le; Le Van Truong; Mai Ngoc Luu; Nguyen Tran Minh Duc; Abdelrahman M. Makram; Truong Van Dat; Nguyen Tien Huy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Vietnam
Description
BackgroundVietnam was one of the countries pursuing the goal of “Zero-COVID” and had effectively achieved it in the first three waves of the pandemic. However, the spread of the Delta variant was outbreak first in Vietnam in late April 2021, in which Ho Chi Minh City was the worst affected. This study surveyed the public's knowledge, attitude, perception, and practice (KAPP) toward COVID-19 during the rapid rise course of the outbreak in Ho Chi Minh City.MethodsThis cross-sectional survey was conducted from 30th September to 16th November 2021, involving 963 residents across the city. We asked residents a series of 21 questions. The response rate was 76.6%. We set a priori level of significance at α = 0.05 for all statistical tests.ResultsThe residents' KAPP scores were 68.67% ± 17.16, 77.33% ± 18.71, 74.7% ± 26.25, and 72.31% ± 31, respectively. KAPP scores of the medical staff were higher than the non-medical group. Our study showed positive, medium–strong Pearson correlations between knowledge and practice (r = 0.337), attitude and practice (r = 0.405), and perception and practice (r = 0.671; p < 0.05). We found 16 rules to estimate the conditional probabilities among KAPP scores via the association rule mining method. Mainly, 94% confident probability of participants had {Knowledge=Good, Attitude=Good, Perception=Good}, as well as {Practice=Good} (in rule 9 with support of 17.6%). In opposition to around 86% to 90% of the times, participants had levels of {Perception=Fair, Practice=Poor} given with either {Attitude=Fair} or {Knowledge=Fair} (according to rules 1, 2, and rules 15, 16 with a support of 7–8%).ConclusionIn addition to the government's directives and policies, citizens' knowledge, attitude, perception, and practice are considered one of the critical preventive measures during the COVID-19 pandemic. The results affirmed the good internal relationship among K, A, P, and P scores creating a hierarchy of healthcare educational goals and health behavior among residents.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 9, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aslan Ahmedov

Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import
Data Understanding and Exploration
Transformation of the data – so that is ready to be consumed by the association rules algorithm
Running association rules
Exploring the rules generated
Filtering the generated rules
Visualization of Rule

Dataset Description

File name: Assignment-1_Data
List name: retaildata
File format: . xlsx
Number of Row: 522065
Number of Attributes: 7
- BillNo: 6-digit number assigned to each transaction. Nominal.
- Itemname: Product name. Nominal.
- Quantity: The quantities of each product per transaction. Numeric.
- Date: The day and time when each transaction was generated. Numeric.
- Price: Product price. Numeric.
- CustomerID: 5-digit number assigned to each customer. Nominal.
- Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
readxl - Read Excel Files in R.
plyr - Tools for Splitting, Applying and Combining Data.
ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
knitr - Dynamic Report generation in R.
magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Clear search

Close search

Google apps

Main menu

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Association rule mining data for census tract chemical exposure analysis

Complete Data Set - For mining association rules in Indian Stock Market

MOESM2 of OmicsARules: a R package for integration of multi-omics datasets...

Smith_ISL_NMF_V1.0.0

This is an dataset of Irish Sign Language (ISL) Non-Manual Feature data.

# Cite:

Table2_East Asian Herbal Medicine to Reduce Primary Pain and Adverse Events...

Table_1_Knowledge, attitude, and perception regarding COVID-19-related...

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing