5 datasets found

A
‘Groceries dataset ’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation
Explore at:
Dataset updated
Aug 15, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

--- Original source retains full ownership of the source dataset ---
f
Number of association rules generated from the HMP (full) dataset with...
plos.figshare.com
xls
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated from the HMP (full) dataset with various run-time thresholds. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154493.t004
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of association rules generated using the Apriori rule mining approach on the HMP (full) dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.
f
Number of association rules generated from the prebiotics dataset with...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated from the prebiotics dataset with various run-time thresholds. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154493.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of association rules generated using the Apriori rule mining approach on the prebiotics dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.
f
Apriori algorithm code.
plos.figshare.com
txt
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fangyuan Li; Xia Wang; Zenglei Feng; Jian Wang; Mengdi Li; Kun JIANG; Changli ZHAO (2024). Apriori algorithm code. [Dataset]. http://doi.org/10.1371/journal.pone.0302216.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302216.s003
Dataset updated
May 23, 2024
Dataset provided by
PLOS ONE
Authors
Fangyuan Li; Xia Wang; Zenglei Feng; Jian Wang; Mengdi Li; Kun JIANG; Changli ZHAO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The real-time monitoring on the risk status of the vehicle and its driver can provide the assistance for the early detection and blocking control of single-vehicle accidents. However, complex risk coupling relationship is one of the main features of single-vehicle accidents with high mortality rate. On the basis of investigating the coupling effect among multi-risk factors and establishing a safety management database throughout the life cycle of vehicles, single-vehicle driving risk network (SVDRN) with a three-level threshold was developed, and its topology features were analyzed to assessment the importance of nodes. To avoid the one-sidedness of single indicator, the multi-attribute comprehensive evaluation model was applied to measure the comprehensive effect of characteristic indicators for nodes importance. A algorithm for real-time monitoring of vehicle driving risk status was proposed to identify key risk chains. The result revealed that improper operation, speeding, loss of vehicle control and inefficient driver management were the sequence of top four risk factors in the comprehensive evaluation result of nodes importance (mean value = 0.185, SD = 0.119). There were minor differences of 0.017 in the node importance among environmental factors, among which non-standard road alignment had the larger value. The improper operation and non-standard road alignment were the highest combination correlation of factors affecting road safety, with the support of 51.81% and the confidence of 69.35%. This identification algorithm of key risk chains that combines node importance and its risk state threshold can effectively determine the high-frequency risk transmission paths and risk factors through multi-vehicle test, providing a basis for centralization management of transport enterprises.
f
Student’s t test of DGAARM and Apriori.
plos.figshare.com
xls
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoxuan Wu; Qiang Wen; Jun Zhu (2024). Student’s t test of DGAARM and Apriori. [Dataset]. http://doi.org/10.1371/journal.pone.0299865.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299865.t010
Dataset updated
Mar 4, 2024
Dataset provided by
PLOS ONE
Authors
Xiaoxuan Wu; Qiang Wen; Jun Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Understanding air quality requires a comprehensive understanding of its various factors. Most of the association rule techniques focuses on high frequency terms, ignoring the potential importance of low- frequency terms and causing unnecessary storage space waste. Therefore, a dynamic genetic association rule mining algorithm is proposed in this paper, which combines the improved dynamic genetic algorithm with the association rule mining algorithm to realize the importance mining of low- frequency terms. Firstly, in the chromosome coding phase of genetic algorithm, an innovative multi-information coding strategy is proposed, which selectively stores similar values of different levels in one storage unit. It avoids storing all the values at once and facilitates efficient mining of valid rules later. Secondly, by weighting the evaluation indicators such as support, confidence and promotion in association rule mining, a new evaluation index is formed, avoiding the need to set a minimum threshold for high-interest rules. Finally, in order to improve the mining performance of the rules, the dynamic crossover rate and mutation rate are set to improve the search efficiency of the algorithm. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the unit point multi-information coding strategy in reducing the rule storage air, the effectiveness of mining the rules formed by the low frequency item set, and the effectiveness of combining the rule mining algorithm with the swarm intelligence optimization algorithm in terms of search time and convergence. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the above three aspects. The unit point multi-information coding strategy reduced the rule space storage consumption by 50%, the new evaluation index can mine more interesting rules whose interest level can be up to 90%, while mining the rules formed by the lower frequency terms, and in terms of search time, we reduced it about 20% compared with some meta-heuristic algorithms, while improving convergence.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation

‘Groceries dataset ’ analyzed by Analyst-2

Explore at:

Dataset updated

Aug 15, 2015

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Association Rule Mining

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

Details of the dataset

The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

Apriori Algorithm

Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

An example of Association Rules

Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Some important terms:

Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.
Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

--- Original source retains full ownership of the source dataset ---

Clear search

Close search

Google apps

Main menu

‘Groceries dataset ’ analyzed by Analyst-2

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms:

Number of association rules generated from the HMP (full) dataset with...

Number of association rules generated from the prebiotics dataset with...

Apriori algorithm code.

Student’s t test of DGAARM and Apriori.

‘Groceries dataset ’ analyzed by Analyst-2

Association Rule Mining

Details of the dataset

Apriori Algorithm

An example of Association Rules

Some important terms: