25 datasets found
  1. f

    A sample transaction database.

    • plos.figshare.com
    xls
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). A sample transaction database. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.

  2. As mentioned in the experiment results section, we divide the data in small...

    • plos.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). As mentioned in the experiment results section, we divide the data in small and large datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The small datasets for calculating the frequency of itemsets in transaction database contain Accidents, Chess, Connection, Mushroom, PUSBM, and Retail [32] transaction datasets. There are 500, 1000, 2000, and 5000 transactions per dataset. The small datasets for calculating the utility of itemsets in a transaction database contain Accidents, Chess, Connection, Mushroom, PUSBM, and Retail [32] transaction datasets. There are 500, 1000, 2000, and 5000 transactions per dataset. The large datasets for caluclating the frequency of itemsets in a transaction database contain Accidents, Connection, and PUSBM [32] datasets. There are 10000, 20000, 30000, and 50000 transactions per dataset. The large datasets for calculating the utility of itemsets in a transaction database contain Accidents, Connection, and PUSBM [32] transaction datasets. There are 10000, 20000, 30000, and 50000 transactions per dataset. (ZIP)

  3. Less frequent itemsets (min. support < 0.40).

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Less frequent itemsets (min. support < 0.40). [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Less frequent itemsets (min. support < 0.40).

  4. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  5. Frequent itemsets (min. support ≥ 0.40).

    • plos.figshare.com
    xls
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Frequent itemsets (min. support ≥ 0.40). [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Frequent itemsets (min. support ≥ 0.40).

  6. Groceries dataset

    • kaggle.com
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
    Explore at:
    zip(263057 bytes)Available download formats
    Dataset updated
    Sep 17, 2020
    Authors
    Heeral Dedhia
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

  7. High frequency high utility (HFHU) itemsets.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). High frequency high utility (HFHU) itemsets. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High frequency high utility (HFHU) itemsets.

  8. d

    Datasets associated with "Mining of Consumer Product and Purchasing Data to...

    • datasets.ai
    • catalog.data.gov
    53
    Updated Jul 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2021). Datasets associated with "Mining of Consumer Product and Purchasing Data to Identify Potential Chemical Co-exposures" [Dataset]. https://datasets.ai/datasets/datasets-associated-with-mining-of-consumer-product-and-purchasing-data-to-identify-potent
    Explore at:
    53Available download formats
    Dataset updated
    Jul 26, 2021
    Dataset authored and provided by
    U.S. Environmental Protection Agency
    Description

    Background: Chemicals in consumer products are a major contributor to human chemical co-exposures. Consumers purchase and use a wide variety of products containing potentially thousands of chemicals. There is a need to identify potential real-world chemical co-exposures in order to prioritize in vitro toxicity screening. However, due to the vast number of potential chemical combinations, this has been a major challenge.

    Objectives: We aim to develop and implement a data-driven procedure for identifying prevalent chemical combinations to which humans are exposed through purchase and use of consumer products.

    Methods: We applied frequent itemset mining on an integrated dataset linking consumer product chemical ingredient data with product purchasing data from sixty thousand households to identify chemical combinations resulting from co-use of consumer products.

    Results: We identified co-occurrence patterns of chemicals over all households as well as those specific to demographic groups based on race/ethnicity, income, education, and family composition. We also identified chemicals with the highest potential for aggregate exposure by identifying chemicals occurring in multiple products used by the same household. Lastly, a case study of chemicals active in estrogen and androgen receptor in silico models revealed priority chemical combinations co-targeting receptors involved in important biological signaling pathways.

    Discussion: Integration and comprehensive analysis of household purchasing data and product-chemical information provided a means to assess human near-field exposure and inform selection of chemical combinations for high-throughput screening in in vitro assays.

    This dataset is associated with the following publication: Stanfield, Z., C. Addington, K. Dionisio, D. Lyons, R. Tornero-Velez, K. Phillips, T. Buckley, and K. Isaacs. Mining of consumer product and purchasing data to identify potential chemical co-exposures.. ENVIRONMENTAL HEALTH PERSPECTIVES. National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC, USA, 129(6): N/A, (2021).

  9. Number of rules for sample database 1.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Number of rules for sample database 1. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t014
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of rules for sample database 1.

  10. Transaction database with profit values.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Transaction database with profit values. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transaction database with profit values.

  11. High frequency low utility (HFLU) itemsets.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). High frequency low utility (HFLU) itemsets. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High frequency low utility (HFLU) itemsets.

  12. Number of rules for all experiments.

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Number of rules for all experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t016
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of rules for all experiments.

  13. p

    An efficient method for mining high average-utility itemsets based on...

    • dona.pwr.edu.pl
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nam N Pham; Huy Minh Huynh; Zuzana K Oplatková; Ngoc Thanh Nguyen; Bay Vo (2025). An efficient method for mining high average-utility itemsets based on particle swarm optimization with multiple minimum thresholds / [Dataset]. http://doi.org/10.1016/j.asoc.2025.114046
    Explore at:
    Dataset updated
    2025
    Authors
    Nam N Pham; Huy Minh Huynh; Zuzana K Oplatková; Ngoc Thanh Nguyen; Bay Vo
    Description

    Library of Wroclaw University of Science and Technology scientific output (DONA database)

  14. Profit table.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Profit table. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Profit table.

  15. Association rules for LFLU → HFLU.

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Association rules for LFLU → HFLU. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Association rules for LFLU → HFLU.

  16. Low Frequency Low Utility (LFLU) itemsets.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe (2023). Low Frequency Low Utility (LFLU) itemsets. [Dataset]. http://doi.org/10.1371/journal.pone.0198066.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jimmy Ming-Tai Wu; Justin Zhan; Sanket Chobe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Low Frequency Low Utility (LFLU) itemsets.

  17. Database after computing the utility values of each item in transactions.

    • plos.figshare.com
    xls
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). Database after computing the utility values of each item in transactions. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database after computing the utility values of each item in transactions.

  18. External utilities of items from Table 1.

    • plos.figshare.com
    xls
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). External utilities of items from Table 1. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.

  19. Set of SML-HUIs.

    • plos.figshare.com
    xls
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). Set of SML-HUIs. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.

  20. Sanitized databases using MLHProtector algorithm.

    • plos.figshare.com
    xls
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). Sanitized databases using MLHProtector algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). A sample transaction database. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t001

A sample transaction database.

Related Article
Explore at:
61 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Feb 3, 2025
Dataset provided by
PLOS ONE
Authors
Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.

Search
Clear search
Close search
Google apps
Main menu