39 datasets found

Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
A
‘Market Basket Analysis Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Market Basket Analysis Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-market-basket-analysis-data-1c3c/915d4e47/?iid=006-454&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Market Basket Analysis Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ahmtcnbs/datasets-for-appiori on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Market Basket Analysis Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.

--- Original source retains full ownership of the source dataset ---
f
DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...
frontiersin.figshare.com
docx
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang (2023). DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based Formulas: A Nationwide Descriptive Study With Market Basket Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.641530.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.641530.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Chi-Jung Tai; Mohamed El-Shazly; Yi-Hong Tsai; Dezső Csupor; Judit Hohmann; Yang-Chang Wu; Tzyy-Guey Tseng; Fang-Rong Chang; Hui-Chun Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: As time evolved, traditional Chinese medicine (TCM) became integrated into the global medical system as complementary treatments. Some essential TCM herbs started to play a limited role in clinical practices because of Western medication development. For example, Fuzi (Aconiti Lateralis Radix Praeparata) is a toxic but indispensable TCM herb. Fuzi was mainly used in poor circulation and life-threatening conditions by history records. However, with various Western medication options for treating critical conditions currently, how is Fuzi used clinically and its indications in modern TCM are unclear. This study aimed to evaluate Fuzi and Fuzi-based formulas in modern clinical practices using artificial intelligence and data mining methods.Methods: This nationwide descriptive study with market basket analysis used a cohort selected from the Taiwan National Health Insurance database that contained one million national representatives between 2003 and 2010 used for our analysis. Descriptive statistics were performed to demonstrate the modern clinical indications of Fuzi. Market basket analysis was calculated by the Apriori algorithm to discover the association rules between Fuzi and other TCM herbs.Results: A total of 104,281 patients using 405,837 prescriptions of Fuzi and Fuzi-based formulas were identified. TCM doctors were found to use Fuzi in pulmonary (21.5%), gastrointestinal (17.3%), and rheumatologic (11.0%) diseases, but not commonly in cardiovascular diseases (7.4%). Long-term users of Fuzi and Fuzi-based formulas often had the following comorbidities diagnosed by Western doctors: osteoarthritis (31.0%), peptic ulcers (29.5%), hypertension (19.9%), and COPD (19.7%). Patients also used concurrent medications such as H2-receptor antagonists, nonsteroidal anti-inflammatory drugs, β-blockers, calcium channel blockers, and aspirin. Through market basket analysis, for the first time, we noticed many practical Fuzi-related herbal pairs such as Fuzi–Hsihsin (Asari Radix et Rhizoma)–Dahuang (Rhei Radix et Rhizoma) for neurologic diseases and headache.Conclusion: For the first time, big data analysis was applied to uncover the modern clinical indications of Fuzi in addition to traditional use. We provided necessary evidence on the scientific use of Fuzi in current TCM practices, and the Fuzi-related herbal pairs discovered in this study are helpful to the development of new botanical drugs.
A
‘Groceries dataset ’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation
Explore at:
Dataset updated
Aug 15, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

--- Original source retains full ownership of the source dataset ---
f
Apriori algorithm-based association rules.
plos.figshare.com
bin
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Apriori algorithm-based association rules. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289749.t001
Dataset updated
Aug 8, 2023
Dataset provided by
PLOS ONE
Authors
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, the prevalence of T2DM has been increasing annually, in particular, the personal and socioeconomic burden caused by multiple complications has become increasingly serious. This study aimed to screen out the high-risk complication combination of T2DM through various data mining methods, establish and evaluate a risk prediction model of the complication combination in patients with T2DM. Questionnaire surveys, physical examinations, and biochemical tests were conducted on 4,937 patients with T2DM, and 810 cases of sample data with complications were retained. The high-risk complication combination was screened by association rules based on the Apriori algorithm. Risk factors were screened using the LASSO regression model, random forest model, and support vector machine. A risk prediction model was established using logistic regression analysis, and a dynamic nomogram was constructed. Receiver operating characteristic (ROC) curves, harrell’s concordance index (C-Index), calibration curves, decision curve analysis (DCA), and internal validation were used to evaluate the differentiation, calibration, and clinical applicability of the models. This study found that patients with T2DM had a high-risk combination of lower extremity vasculopathy, diabetic foot, and diabetic retinopathy. Based on this, body mass index, diastolic blood pressure, total cholesterol, triglyceride, 2-hour postprandial blood glucose and blood urea nitrogen levels were screened and used for the modeling analysis. The area under the ROC curves of the internal and external validations were 0.768 (95% CI, 0.744−0.792) and 0.745 (95% CI, 0.669−0.820), respectively, and the C-index and AUC value were consistent. The calibration plots showed good calibration, and the risk threshold for DCA was 30–54%. In this study, we developed and evaluated a predictive model for the development of a high-risk complication combination while uncovering the pattern of complications in patients with T2DM. This model has a practical guiding effect on the health management of patients with T2DM in community settings.
A
‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-for-market-basket-analysis-mba-d4c7/a0d6998a/?iid=009-334&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries dataset for Market Basket Analysis(MBA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rashikrahmanpritom/groceries-dataset-for-market-basket-analysismba on 13 November 2021.

--- Dataset description provided by original source is as follows ---

The initial dataset was collected from Groceries dataset. Then data was modified and fragmented into 2 datasets for ease of MBA implementation. Here the "groceries data.csv" contains groceries transaction data from which you can do EDA and pre-process the data to feed it in the apriori algorithm. But I have also added pre-processed data as "basket.csv" from which you'll just need to replace nan and encode it using TransactionEncoder after that you can feed the encoded data into the apriori algorithm.

--- Original source retains full ownership of the source dataset ---
CAPTCHA Dataset Tablet
figshare.com
xlsx
Updated Dec 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darko Brodic; Alessia Amelio (2018). CAPTCHA Dataset Tablet [Dataset]. http://doi.org/10.6084/m9.figshare.7531463.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7531463.v1
Dataset updated
Dec 29, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Darko Brodic; Alessia Amelio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the paper: "Exploring the Usability of the Text-based CAPTCHA on Tablet Computers", by Darko Brodic and Alessia Amelio
f
Data from: Variable description.
figshare.com
plos.figshare.com
xls
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). Variable description. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t001
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Variable description.
i
Carbon Footprint Optimisation of dishes (HashMap AIA and Apriori Version)
ieee-dataport.org
Updated Sep 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andres Frederic (2023). Carbon Footprint Optimisation of dishes (HashMap AIA and Apriori Version) [Dataset]. https://ieee-dataport.org/documents/carbon-footprint-optimisation-dishes-hashmap-aia-and-apriori-version
Explore at:
Dataset updated
Sep 12, 2023
Authors
Andres Frederic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
the dataset serves as a versatile resource for both performance analysis and environmental impact assessment. The unique attribute of this dataset lies in its ability to calculate representative values of carbon footprint optimization through multiple algorithmic implementations.
The result comparison of the different D.
plos.figshare.com
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The result comparison of the different D. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The result comparison of the different D.
m
Data for: Mining multiple association rules in LTPP database: an analysis of...
data.mendeley.com
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiwen Hao (2018). Data for: Mining multiple association rules in LTPP database: an analysis of asphalt pavement thermal cracking distress [Dataset]. http://doi.org/10.17632/w94jndtmpr.1
Explore at:
Unique identifier
https://doi.org/10.17632/w94jndtmpr.1
Dataset updated
Oct 16, 2018
Authors
Peiwen Hao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
MATLAB Codes and original data for Apriori
f
The SAR difference of different confidence degree thresholds in D = 3.
plos.figshare.com
xls
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The SAR difference of different confidence degree thresholds in D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t009
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SAR difference of different confidence degree thresholds in D = 3.
C
Cost Estimating and Quoting Software for Manufacturing Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Cost Estimating and Quoting Software for Manufacturing Report [Dataset]. https://www.archivemarketresearch.com/reports/cost-estimating-and-quoting-software-for-manufacturing-557245
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for Cost Estimating and Quoting Software for Manufacturing is experiencing robust growth, driven by increasing demand for efficient production planning and optimized resource allocation within manufacturing businesses. This software streamlines the quoting process, reduces errors, and improves overall profitability. While precise market size data is unavailable, based on industry trends and comparable software markets, a reasonable estimate for the 2025 market size could be around $2.5 billion USD. Assuming a conservative Compound Annual Growth Rate (CAGR) of 10% over the forecast period (2025-2033), the market is projected to reach approximately $6.5 billion by 2033. This growth is fueled by several key factors. The rising adoption of Industry 4.0 technologies, including advanced analytics and cloud-based solutions, is significantly impacting the sector. Furthermore, the increasing complexity of manufacturing processes and the need for accurate cost estimations are creating a strong impetus for businesses to adopt sophisticated estimating and quoting software. Several key trends are shaping the market's trajectory. The integration of artificial intelligence (AI) and machine learning (ML) into cost estimating and quoting software is enhancing accuracy and prediction capabilities. Moreover, the shift towards subscription-based software models and the increasing availability of cloud-based solutions are making the technology more accessible and cost-effective for businesses of all sizes. While some restraints exist, such as the initial investment cost and the need for employee training, the overall benefits of improved efficiency and profitability are driving widespread adoption. The market is segmented by software type (cloud-based, on-premise), industry vertical (automotive, aerospace, etc.), and enterprise size (small, medium, large). Key players, including Jobscope, MTI Systems, SOLIDWORKS, and others, are competing to capture market share through innovation and strategic partnerships.
f
The one-item SAR of D = 3.
plos.figshare.com
xls
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The one-item SAR of D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t005
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The one-item SAR of D = 3.
MOESM2 of Data mining combined to the multicriteria decision analysis for...
springernature.figshare.com
figshare.com
application/cdfv2
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM2 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660082.v1
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7660082.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
Additional file 2. The integral table of transactions T.
The discretization results of D = 3.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The discretization results of D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t003
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The discretization results of D = 3.
MOESM3 of Data mining combined to the multicriteria decision analysis for...
figshare.com
springernature.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar (2023). MOESM3 of Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of France [Dataset]. http://doi.org/10.6084/m9.figshare.7660091.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7660091.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Fatima El Mazouri; Mohammed Chaouki Abounaima; Khalid Zenkouar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
Additional file 3. The integral matrix of concordance indices.
f
The two-item SAR of D = 3.
plos.figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han (2023). The two-item SAR of D = 3. [Dataset]. http://doi.org/10.1371/journal.pone.0255684.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255684.t006
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Xin Liu; Xuefeng Sang; Jiaxuan Chang; Yang Zheng; Yuping Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The two-item SAR of D = 3.
f
Table_1_Delayed Comparison and Apriori Algorithm (DCAA): A Tool for...
frontiersin.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lianhong Ding; Shaoshuai Xie; Shucui Zhang; Hangyu Shen; Huaqiang Zhong; Daoyuan Li; Peng Shi; Lianli Chi; Qunye Zhang (2023). Table_1_Delayed Comparison and Apriori Algorithm (DCAA): A Tool for Discovering Protein–Protein Interactions From Time-Series Phosphoproteomic Data.XLSX [Dataset]. http://doi.org/10.3389/fmolb.2020.606570.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2020.606570.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Lianhong Ding; Shaoshuai Xie; Shucui Zhang; Hangyu Shen; Huaqiang Zhong; Daoyuan Li; Peng Shi; Lianli Chi; Qunye Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of high-throughput omics data is one of the most important approaches for obtaining information regarding interactions between proteins/genes. Time-series omics data are a series of omics data points indexed in time order and normally contain more abundant information about the interactions between biological macromolecules than static omics data. In addition, phosphorylation is a key posttranslational modification (PTM) that is indicative of possible protein function changes in cellular processes. Analysis of time-series phosphoproteomic data should provide more meaningful information about protein interactions. However, although many algorithms, databases, and websites have been developed to analyze omics data, the tools dedicated to discovering molecular interactions from time-series omics data, especially from time-series phosphoproteomic data, are still scarce. Moreover, most reported tools ignore the lag between functional alterations and the corresponding changes in protein synthesis/PTM and are highly dependent on previous knowledge, resulting in high false-positive rates and difficulties in finding newly discovered protein–protein interactions (PPIs). Therefore, in the present study, we developed a new method to discover protein–protein interactions with the delayed comparison and Apriori algorithm (DCAA) to address the aforementioned problems. DCAA is based on the idea that there is a lag between functional alterations and the corresponding changes in protein synthesis/PTM. The Apriori algorithm was used to mine association rules from the relationships between items in a dataset and find PPIs based on time-series phosphoproteomic data. The advantage of DCAA is that it does not rely on previous knowledge and the PPI database. The analysis of actual time-series phosphoproteomic data showed that more than 68% of the protein interactions/regulatory relationships predicted by DCAA were accurate. As an analytical tool for PPIs that does not rely on a priori knowledge, DCAA should be useful to predict PPIs from time-series omics data, and this approach is not limited to phosphoproteomic data.
f
DataSheet2_An Interaction-Based Method for Refining Results From Gene Set...
frontiersin.figshare.com
xlsx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yishen Wang; Yiwen Hong; Shudi Mao; Yukang Jiang; Yamei Cui; Jianying Pan; Yan Luo (2023). DataSheet2_An Interaction-Based Method for Refining Results From Gene Set Enrichment Analysis.xlsx [Dataset]. http://doi.org/10.3389/fgene.2022.890672.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.890672.s002
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Yishen Wang; Yiwen Hong; Shudi Mao; Yukang Jiang; Yamei Cui; Jianying Pan; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose: To demonstrate an interaction-based method for the refinement of Gene Set Enrichment Analysis (GSEA) results.Method: Intravitreal injection of miR-124-3p antagomir was used to knockdown the expression of miR-124-3p in mouse retina at postnatal day 3 (P3). Whole retinal RNA was extracted for mRNA transcriptome sequencing at P9. After preprocessing the dataset, GSEA was performed, and the leading-edge subsets were obtained. The Apriori algorithm was used to identify the frequent genes or gene sets from the union of the leading-edge subsets. A new statistic d was introduced to evaluate the frequent genes or gene sets. Reverse transcription quantitative PCR (RT-qPCR) was performed to validate the expression trend of candidate genes after the knockdown of miR-124-3p.Results: A total of 115,140 assembled transcript sequences were obtained from the clean data. With GSEA, the NOD-like receptor signaling pathway, C-type-like lectin receptor signaling pathway, phagosome, necroptosis, JAK-STAT signaling pathway, Toll-like receptor signaling pathway, leukocyte transendothelial migration, chemokine signaling pathway, NF-kappa B signaling pathway and RIG-I-like signaling pathway were identified as the top 10 enriched pathways, and their leading-edge subsets were obtained. After being refined by the Apriori algorithm and sorted by the value of the modulus of d, Prkcd, Irf9, Stat3, Cxcl12, Stat1, Stat2, Isg15, Eif2ak2, Il6st, Pdgfra, Socs4 and Csf2ra had the significant number of interactions and the greatest value of d to downstream genes among all frequent transactions. Results of RT-qPCR validation for the expression of candidate genes after the knockdown of miR-124-3p showed a similar trend to the RNA-Seq results.Conclusion: This study indicated that using the Apriori algorithm and defining the statistic d was a novel way to refine the GSEA results. We hope to convey the intricacies from the computational results to the low-throughput experiments, and to plan experimental investigations specifically.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 9, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aslan Ahmedov

Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import
Data Understanding and Exploration
Transformation of the data – so that is ready to be consumed by the association rules algorithm
Running association rules
Exploring the rules generated
Filtering the generated rules
Visualization of Rule

Dataset Description

File name: Assignment-1_Data
List name: retaildata
File format: . xlsx
Number of Row: 522065
Number of Attributes: 7
- BillNo: 6-digit number assigned to each transaction. Nominal.
- Itemname: Product name. Nominal.
- Quantity: The quantities of each product per transaction. Numeric.
- Date: The day and time when each transaction was generated. Numeric.
- Price: Product price. Numeric.
- CustomerID: 5-digit number assigned to each customer. Nominal.
- Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
readxl - Read Excel Files in R.
plyr - Tools for Splitting, Applying and Combining Data.
ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
knitr - Dynamic Report generation in R.
magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Clear search

Close search

Google apps

Main menu

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

‘Market Basket Analysis Data’ analyzed by Analyst-2

DataSheet1_Uncovering Modern Clinical Applications of Fuzi and Fuzi-Based...

‘Groceries dataset ’ analyzed by Analyst-2

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms:

Apriori algorithm-based association rules.

‘Groceries dataset for Market Basket Analysis(MBA)’ analyzed by Analyst-2

CAPTCHA Dataset Tablet

Data from: Variable description.

Carbon Footprint Optimisation of dishes (HashMap AIA and Apriori Version)

The result comparison of the different D.

Data for: Mining multiple association rules in LTPP database: an analysis of...

The SAR difference of different confidence degree thresholds in D = 3.

Cost Estimating and Quoting Software for Manufacturing Report

The one-item SAR of D = 3.

MOESM2 of Data mining combined to the multicriteria decision analysis for...

The discretization results of D = 3.

MOESM3 of Data mining combined to the multicriteria decision analysis for...

The two-item SAR of D = 3.

Table_1_Delayed Comparison and Apriori Algorithm (DCAA): A Tool for...

DataSheet2_An Interaction-Based Method for Refining Results From Gene Set...

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing