6 datasets found
  1. f

    Medical dataset in 3-diversity model.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  2. f

    S1 Data -

    • plos.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0285212.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  3. f

    A sample medical dataset.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  4. f

    Full list of indices available on SPARK. Each index can be extracted from...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Full list of indices available on SPARK. Each index can be extracted from the weighted directed, weighted undirected, unweighted directed and unweighted undirected version of the underlying network depending on the scenario to deal with. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Full list of indices available on SPARK. Each index can be extracted from the weighted directed, weighted undirected, unweighted directed and unweighted undirected version of the underlying network depending on the scenario to deal with.

  5. f

    Synthetic data generating parameters. The table summarizes the generating...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2.

  6. f

    ANOVA results for the distributions in Fig 8. Four different one-way ANOVA...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003

Medical dataset in 3-diversity model.

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Farough Ashkouti; Keyhan Khamforoosh
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

Search
Clear search
Close search
Google apps
Main menu