7 datasets found
  1. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Association rule mining data for census tract chemical exposure analysis

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Association rule mining data for census tract chemical exposure analysis [Dataset]. https://catalog.data.gov/dataset/association-rule-mining-data-for-census-tract-chemical-exposure-analysis
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Chemical concentration, exposure, and health risk data for U.S. census tracts from National Scale Air Toxics Assessment (NATA). This dataset is associated with the following publication: Huang, H., R. Tornero-Velez, and T. Barzyk. Associations between socio-demographic characteristics and chemical concentrations contributing to cumulative exposures in the United States. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 27(6): 544-550, (2017).

  3. Complete Data Set - For mining association rules in Indian Stock Market

    • figshare.com
    docx
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srinath Mitragotri (2024). Complete Data Set - For mining association rules in Indian Stock Market [Dataset]. http://doi.org/10.6084/m9.figshare.21399549.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Nov 3, 2024
    Dataset provided by
    figshare
    Authors
    Srinath Mitragotri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is contained in the winrar file - 'DataSet-AssociationMining-India.rar'

    Once you open the above winrar file, you will see the below files & folders:

    • File: "IndiaData-ForAssociationMining.xlsx" is the primary data retrieved from 'Refinitiv-Datastream' which was used in the project.

    • Folder-1MetricsGT-NSE50 o This folder has MS-Excel macro files used to create return determinant data to be eventually used in the 'Final-Transaction-Table' from which associations would be mined. o This folder also has computed returns for different holding periods for different stocks considered in this study. File: "0_nYrRtnGTNSE50.xlsm" o This folder also has the 'Final-Sheet' used for mining of association rules.

    • Folder: 2Analysis-GTNSE50 o This folder has the R-program used to mine associations. It also has the final sheets used in association mining for different holding periods. And the output of the association rules mined is also stored here (File name: RulesRHS_1YrRtnGTNSE50.csv and so on)

    • Folder: 3Validation o This folder has data related to the validation carried out in the project. It has 2 sub-folders: § 1-MetricsForValidation: This folder has excel-macro files to compute the metrics required in the Final-Table for validation of the association rules. § 2-BetaCalc-PortRtns: This folder has the Final transaction sheet which will be later used to compute portfolio beta and portfolio returns for each association rule. This also has the computation of portfolio beta & portfolio returns for each of the 10 association rules analyzed in this paper.

    • Folder: 4LogitRegression o This folder has the 'R' program used to carry out Logit regression and different model consistency test. It also has the input file for the Logit regression (Filename: India-LogitRegression-csv.csv) o The sub-folder 'Regression_OP' has the output of Logit regression for all association rules for different holding periods.

  4. f

    MOESM2 of OmicsARules: a R package for integration of multi-omics datasets...

    • springernature.figshare.com
    xlsx
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danze Chen; Fan Zhang; Qianqian Zhao; Jianzhen Xu (2024). MOESM2 of OmicsARules: a R package for integration of multi-omics datasets via association rules mining [Dataset]. http://doi.org/10.6084/m9.figshare.10278410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    figshare
    Authors
    Danze Chen; Fan Zhang; Qianqian Zhao; Jianzhen Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Table S1. General information of three real datasets downloaded from TCGA. Table S2. Top 20 rules identified from BRCA mRNA dataset. Table S3. Top 20 rules identified from BRCA DNA methylation. Table S4. Top 20 rules identified from ESCA mRNA dataset. Table S5. Top 20 rules identified from ESCA DNA methylation dataset. Table S6. Top 20 rules identified from LUAD mRNA dataset. Table S7. Top 20 rules identified from LUAD DNA methylation dataset. Table S8. Top 20 rules identified from the combined BRCA mRNA and DNA methylation datasets. Table S9. Top 20 rules identified from the combined ESCA mRNA and DNA methylation datasets. Table S10. Top 20 rules identified from the combined LUAD mRNA and DNA methylation datasets.

  5. Smith_ISL_NMF_V1.0.0

    • zenodo.org
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert G. Smith; Robert G. Smith (2025). Smith_ISL_NMF_V1.0.0 [Dataset]. http://doi.org/10.5281/zenodo.10639554
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert G. Smith; Robert G. Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an dataset of Irish Sign Language (ISL) Non-Manual Feature data.

    # Cite:

    Robert G. Smith, (2023). Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing. PhD Dissertation. Technological University Dublin. Dublin, Ireland.

    Robert G. Smith. (2024). TUD-RSmith/PhD-Appendices: First release - Smith_NMF Dataset V1.0.0 (Smith_NMF_v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.10639554

    [![DOI](https://zenodo.org/badge/560578153.svg)](https://zenodo.org/doi/10.5281/zenodo.10639533)

    ## About
    This dataset was published in the appendix of a PhD Dissertation by Robert G. Smith robert.smith@tudublin.ie

    Cite: Robert G. Smith, Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing, PhD Dissertation, Technological University Dublin, Ireland, 2023.

    The dataset is comprised of several smaller datasets:
    ### Appendix C
    [Appendix C](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixC-most_frequent_lexical_items_in_the_SOI_corpus)
    lexical frequency list (see: Smith, R. G. & Hofmann, M., (2020). A Lexical Frequency Analysis of Irish Sign Language. TEANGA, the Journal of the Irish Association for Applied Linguistics, 11, 18–47. https://doi.org/10.35903/teanga.v11i1.162)

    ### Appendix D
    [Appendix D](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixD-all_association_rules)
    Association rules. This was the main output of the PhD work. See the dissertation for method. (this dir includes filtered and unfiltered data)

    ### Appendix E
    [Appendix E](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixE-Datasets)
    Datasets used to generate association rules

    ### Appendix F
    [Appendix F](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixF-Source_code)
    Source code (R) used to generate rules listed in Appendix D

    ### Appendix G
    [Appendix G](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixG-integrity_test)
    Source code (R) used for integrity testing

  6. f

    Table2_East Asian Herbal Medicine to Reduce Primary Pain and Adverse Events...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hee-Geun Jo; Jihye Seo; Seulki Choi; Donghun Lee (2023). Table2_East Asian Herbal Medicine to Reduce Primary Pain and Adverse Events in Cancer Patients : A Systematic Review and Meta-Analysis With Association Rule Mining to Identify Core Herb Combination.docx [Dataset]. http://doi.org/10.3389/fphar.2021.800571.s011
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Hee-Geun Jo; Jihye Seo; Seulki Choi; Donghun Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objective: Cancer pain is an important factor in cancer management that affects a patient’s quality of life and survival-related outcomes. The aim of this review was to systematically evaluate the efficacy and safety of oral administration of East Asian herbal medicine (EAHM) for primary cancer pain and to explore core herb patterns based on the collected data.Methods: A comprehensive literature search was conducted in 11 electronic databases, namely, PubMed, Cochrane Library, Cumulative Index to Nursing & Allied Health Literature, EMBASE, Korean Studies Information Service System, Research Information Service System, Oriental Medicine Advanced Searching Integrated System, Korea Citation Index, Chinese National Knowledge Infrastructure Database (CNKI), Wanfang Data, and CiNii for randomized controlled trials from their inception until August 19, 2021. Statistical analysis was performed in R version 4.1.1 and R studio program using the default settings of the meta-package. When heterogeneity in studies was detected, the cause was identified through meta-regression and subgroup analysis. Methodological quality was independently assessed using the revised tool for risk of bias in randomized trials (Rob 2.0).Results: A total of 38 trials with 3,434 cancer pain patients met the selection criteria. Meta-analysis favored EAHM-combined conventional medicine on response rate (risk ratio: 1.06; 95% CI: 1.04 to 1.09, p < 0.0001), continuous pain intensity (standardized mean difference: −1.74; 95% CI: −2.17 to −1.30, p < 0.0001), duration of pain relief (standardized mean difference: 0.96, 95% CI: 0.69 to 1.22, p < 0.0001), performance status (weighted mean difference: 10.71; 95% CI: 4.89 to 16.53, p = 0.0003), and opioid usage (weighted mean difference: −20.66 mg/day; 95% CI: −30.22 to −11.10, p < 0.0001). No significant difference was observed between EAHM and conventional medicine on response rate and other outcomes. Patients treated with EAHM had significantly reduced adverse event (AE) incidence rates. In addition, based on the ingredients of herb data in this meta-analysis, four combinations of herb pairs, which were frequently used together for cancer pain, were derived.Conclusion: EAHM monotherapy can decrease adverse events associated with pain management in cancer patients. Additionally, EAHM-combined conventional medicine therapy may be beneficial for patients with cancer pain in increasing the response rate, relieving pain intensity, improving pain-related performance status, and regulating opioid usage. However, the efficacy and safety of EAHM monotherapy are difficult to conclude due to the lack of methodological quality and quantity of studies. More well-designed, multicenter, double-blind, and placebo-controlled randomized clinical trials are needed in the future. In terms of the core herb combination patterns derived from the present review, four combinations of herb pairs might be promising for cancer pain because they have been often distinctly used for cancer patients in East Asia. Thus, they are considered to be worth a follow-up study to elucidate their actions and effects.Systematic Review Registration:https://www.crd.york.ac.uk/prospero/, identifier CRD42021265804

  7. f

    Table_1_Knowledge, attitude, and perception regarding COVID-19-related...

    • frontiersin.figshare.com
    docx
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thoa Le; Trang T. B. Le; Le Van Truong; Mai Ngoc Luu; Nguyen Tran Minh Duc; Abdelrahman M. Makram; Truong Van Dat; Nguyen Tien Huy (2023). Table_1_Knowledge, attitude, and perception regarding COVID-19-related prevention practice among residents in Vietnam: a cross-sectional study.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2023.1100335.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Thoa Le; Trang T. B. Le; Le Van Truong; Mai Ngoc Luu; Nguyen Tran Minh Duc; Abdelrahman M. Makram; Truong Van Dat; Nguyen Tien Huy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Vietnam
    Description

    BackgroundVietnam was one of the countries pursuing the goal of “Zero-COVID” and had effectively achieved it in the first three waves of the pandemic. However, the spread of the Delta variant was outbreak first in Vietnam in late April 2021, in which Ho Chi Minh City was the worst affected. This study surveyed the public's knowledge, attitude, perception, and practice (KAPP) toward COVID-19 during the rapid rise course of the outbreak in Ho Chi Minh City.MethodsThis cross-sectional survey was conducted from 30th September to 16th November 2021, involving 963 residents across the city. We asked residents a series of 21 questions. The response rate was 76.6%. We set a priori level of significance at α = 0.05 for all statistical tests.ResultsThe residents' KAPP scores were 68.67% ± 17.16, 77.33% ± 18.71, 74.7% ± 26.25, and 72.31% ± 31, respectively. KAPP scores of the medical staff were higher than the non-medical group. Our study showed positive, medium–strong Pearson correlations between knowledge and practice (r = 0.337), attitude and practice (r = 0.405), and perception and practice (r = 0.671; p < 0.05). We found 16 rules to estimate the conditional probabilities among KAPP scores via the association rule mining method. Mainly, 94% confident probability of participants had {Knowledge=Good, Attitude=Good, Perception=Good}, as well as {Practice=Good} (in rule 9 with support of 17.6%). In opposition to around 86% to 90% of the times, participants had levels of {Perception=Fair, Practice=Poor} given with either {Attitude=Fair} or {Knowledge=Fair} (according to rules 1, 2, and rules 15, 16 with a support of 7–8%).ConclusionIn addition to the government's directives and policies, citizens' knowledge, attitude, perception, and practice are considered one of the critical preventive measures during the COVID-19 pandemic. The results affirmed the good internal relationship among K, A, P, and P scores creating a hierarchy of healthcare educational goals and health behavior among residents.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu