Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Chemical concentration, exposure, and health risk data for U.S. census tracts from National Scale Air Toxics Assessment (NATA). This dataset is associated with the following publication: Huang, H., R. Tornero-Velez, and T. Barzyk. Associations between socio-demographic characteristics and chemical concentrations contributing to cumulative exposures in the United States. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 27(6): 544-550, (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is contained in the winrar file - 'DataSet-AssociationMining-India.rar'
Once you open the above winrar file, you will see the below files & folders:
File: "IndiaData-ForAssociationMining.xlsx" is the primary data retrieved from 'Refinitiv-Datastream' which was used in the project.
Folder-1MetricsGT-NSE50 o This folder has MS-Excel macro files used to create return determinant data to be eventually used in the 'Final-Transaction-Table' from which associations would be mined. o This folder also has computed returns for different holding periods for different stocks considered in this study. File: "0_nYrRtnGTNSE50.xlsm" o This folder also has the 'Final-Sheet' used for mining of association rules.
Folder: 2Analysis-GTNSE50 o This folder has the R-program used to mine associations. It also has the final sheets used in association mining for different holding periods. And the output of the association rules mined is also stored here (File name: RulesRHS_1YrRtnGTNSE50.csv and so on)
Folder: 3Validation o This folder has data related to the validation carried out in the project. It has 2 sub-folders: § 1-MetricsForValidation: This folder has excel-macro files to compute the metrics required in the Final-Table for validation of the association rules. § 2-BetaCalc-PortRtns: This folder has the Final transaction sheet which will be later used to compute portfolio beta and portfolio returns for each association rule. This also has the computation of portfolio beta & portfolio returns for each of the 10 association rules analyzed in this paper.
Folder: 4LogitRegression o This folder has the 'R' program used to carry out Logit regression and different model consistency test. It also has the input file for the Logit regression (Filename: India-LogitRegression-csv.csv) o The sub-folder 'Regression_OP' has the output of Logit regression for all association rules for different holding periods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2: Table S1. General information of three real datasets downloaded from TCGA. Table S2. Top 20 rules identified from BRCA mRNA dataset. Table S3. Top 20 rules identified from BRCA DNA methylation. Table S4. Top 20 rules identified from ESCA mRNA dataset. Table S5. Top 20 rules identified from ESCA DNA methylation dataset. Table S6. Top 20 rules identified from LUAD mRNA dataset. Table S7. Top 20 rules identified from LUAD DNA methylation dataset. Table S8. Top 20 rules identified from the combined BRCA mRNA and DNA methylation datasets. Table S9. Top 20 rules identified from the combined ESCA mRNA and DNA methylation datasets. Table S10. Top 20 rules identified from the combined LUAD mRNA and DNA methylation datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Robert G. Smith, (2023). Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing. PhD Dissertation. Technological University Dublin. Dublin, Ireland.
Robert G. Smith. (2024). TUD-RSmith/PhD-Appendices: First release - Smith_NMF Dataset V1.0.0 (Smith_NMF_v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.10639554
[](https://zenodo.org/doi/10.5281/zenodo.10639533)
## About
This dataset was published in the appendix of a PhD Dissertation by Robert G. Smith robert.smith@tudublin.ie
Cite: Robert G. Smith, Exploiting Association Rules Mining to Inform the Use of Non-Manual Features in Sign Language Processing, PhD Dissertation, Technological University Dublin, Ireland, 2023.
The dataset is comprised of several smaller datasets:
### Appendix C
[Appendix C](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixC-most_frequent_lexical_items_in_the_SOI_corpus)
lexical frequency list (see: Smith, R. G. & Hofmann, M., (2020). A Lexical Frequency Analysis of Irish Sign Language. TEANGA, the Journal of the Irish Association for Applied Linguistics, 11, 18–47. https://doi.org/10.35903/teanga.v11i1.162)
### Appendix D
[Appendix D](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixD-all_association_rules)
Association rules. This was the main output of the PhD work. See the dissertation for method. (this dir includes filtered and unfiltered data)
### Appendix E
[Appendix E](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixE-Datasets)
Datasets used to generate association rules
### Appendix F
[Appendix F](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixF-Source_code)
Source code (R) used to generate rules listed in Appendix D
### Appendix G
[Appendix G](https://github.com/TUD-RSmith/PhD-Appendices/tree/main/AppendixG-integrity_test)
Source code (R) used for integrity testing
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objective: Cancer pain is an important factor in cancer management that affects a patient’s quality of life and survival-related outcomes. The aim of this review was to systematically evaluate the efficacy and safety of oral administration of East Asian herbal medicine (EAHM) for primary cancer pain and to explore core herb patterns based on the collected data.Methods: A comprehensive literature search was conducted in 11 electronic databases, namely, PubMed, Cochrane Library, Cumulative Index to Nursing & Allied Health Literature, EMBASE, Korean Studies Information Service System, Research Information Service System, Oriental Medicine Advanced Searching Integrated System, Korea Citation Index, Chinese National Knowledge Infrastructure Database (CNKI), Wanfang Data, and CiNii for randomized controlled trials from their inception until August 19, 2021. Statistical analysis was performed in R version 4.1.1 and R studio program using the default settings of the meta-package. When heterogeneity in studies was detected, the cause was identified through meta-regression and subgroup analysis. Methodological quality was independently assessed using the revised tool for risk of bias in randomized trials (Rob 2.0).Results: A total of 38 trials with 3,434 cancer pain patients met the selection criteria. Meta-analysis favored EAHM-combined conventional medicine on response rate (risk ratio: 1.06; 95% CI: 1.04 to 1.09, p < 0.0001), continuous pain intensity (standardized mean difference: −1.74; 95% CI: −2.17 to −1.30, p < 0.0001), duration of pain relief (standardized mean difference: 0.96, 95% CI: 0.69 to 1.22, p < 0.0001), performance status (weighted mean difference: 10.71; 95% CI: 4.89 to 16.53, p = 0.0003), and opioid usage (weighted mean difference: −20.66 mg/day; 95% CI: −30.22 to −11.10, p < 0.0001). No significant difference was observed between EAHM and conventional medicine on response rate and other outcomes. Patients treated with EAHM had significantly reduced adverse event (AE) incidence rates. In addition, based on the ingredients of herb data in this meta-analysis, four combinations of herb pairs, which were frequently used together for cancer pain, were derived.Conclusion: EAHM monotherapy can decrease adverse events associated with pain management in cancer patients. Additionally, EAHM-combined conventional medicine therapy may be beneficial for patients with cancer pain in increasing the response rate, relieving pain intensity, improving pain-related performance status, and regulating opioid usage. However, the efficacy and safety of EAHM monotherapy are difficult to conclude due to the lack of methodological quality and quantity of studies. More well-designed, multicenter, double-blind, and placebo-controlled randomized clinical trials are needed in the future. In terms of the core herb combination patterns derived from the present review, four combinations of herb pairs might be promising for cancer pain because they have been often distinctly used for cancer patients in East Asia. Thus, they are considered to be worth a follow-up study to elucidate their actions and effects.Systematic Review Registration:https://www.crd.york.ac.uk/prospero/, identifier CRD42021265804
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundVietnam was one of the countries pursuing the goal of “Zero-COVID” and had effectively achieved it in the first three waves of the pandemic. However, the spread of the Delta variant was outbreak first in Vietnam in late April 2021, in which Ho Chi Minh City was the worst affected. This study surveyed the public's knowledge, attitude, perception, and practice (KAPP) toward COVID-19 during the rapid rise course of the outbreak in Ho Chi Minh City.MethodsThis cross-sectional survey was conducted from 30th September to 16th November 2021, involving 963 residents across the city. We asked residents a series of 21 questions. The response rate was 76.6%. We set a priori level of significance at α = 0.05 for all statistical tests.ResultsThe residents' KAPP scores were 68.67% ± 17.16, 77.33% ± 18.71, 74.7% ± 26.25, and 72.31% ± 31, respectively. KAPP scores of the medical staff were higher than the non-medical group. Our study showed positive, medium–strong Pearson correlations between knowledge and practice (r = 0.337), attitude and practice (r = 0.405), and perception and practice (r = 0.671; p < 0.05). We found 16 rules to estimate the conditional probabilities among KAPP scores via the association rule mining method. Mainly, 94% confident probability of participants had {Knowledge=Good, Attitude=Good, Perception=Good}, as well as {Practice=Good} (in rule 9 with support of 17.6%). In opposition to around 86% to 90% of the times, participants had levels of {Perception=Fair, Practice=Poor} given with either {Attitude=Fair} or {Knowledge=Fair} (according to rules 1, 2, and rules 15, 16 with a support of 7–8%).ConclusionIn addition to the government's directives and policies, citizens' knowledge, attitude, perception, and practice are considered one of the critical preventive measures during the COVID-19 pandemic. The results affirmed the good internal relationship among K, A, P, and P scores creating a hierarchy of healthcare educational goals and health behavior among residents.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...