10 datasets found

Medical dataset in 3-diversity model.
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
S1 Data -
plos.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0285212.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
The partitions variables used in this paper.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eman Khashan; Ali Eldesouky; Sally Elghamrawy (2023). The partitions variables used in this paper. [Dataset]. http://doi.org/10.1371/journal.pone.0255562.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255562.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Eman Khashan; Ali Eldesouky; Sally Elghamrawy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The partitions variables used in this paper.
pone.0285212.t004 - A distributed computing model for big data anonymization...
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). pone.0285212.t004 - A distributed computing model for big data anonymization in the networks [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
pone.0285212.t004 - A distributed computing model for big data anonymization in the networks
Synthetic data generating parameters. The table summarizes the generating...
plos.figshare.com
xls
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319031.t002
Dataset updated
Jun 5, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2.
f
Classification performances for the Breast Cancer Wisconsin dataset....
plos.figshare.com
xls
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Classification performances for the Breast Cancer Wisconsin dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319031.t005
Dataset updated
Jun 5, 2025
Dataset provided by
PLOS ONE
Authors
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification performances for the Breast Cancer Wisconsin dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector.
f
ANOVA results for the distributions in Fig 8. Four different one-way ANOVA...
plos.figshare.com
xls
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319031.t004
Dataset updated
Jun 5, 2025
Dataset provided by
PLOS ONE
Authors
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table.
Classification performances for the rice dataset. Performance indices are...
plos.figshare.com
xls
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Classification performances for the rice dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319031.t006
Dataset updated
Jun 5, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification performances for the rice dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector.
F1-measure criterion on the anonymous poker hand for λ = 4, ḱ = 3 and...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). F1-measure criterion on the anonymous poker hand for λ = 4, ḱ = 3 and different values of k. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t009
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
F1-measure criterion on the anonymous poker hand for λ = 4, ḱ = 3 and different values of k.
Accuracy criterion on the anonymous poker hand for λ = 4, ḱ = 3 and...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). Accuracy criterion on the anonymous poker hand for λ = 4, ḱ = 3 and different values of k. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t008
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy criterion on the anonymous poker hand for λ = 4, ḱ = 3 and different values of k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003

Medical dataset in 3-diversity model.

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0285212.t003

Dataset updated

May 31, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Farough Ashkouti; Keyhan Khamforoosh

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

Clear search

Close search

Google apps

Main menu

Medical dataset in 3-diversity model.

S1 Data -

The partitions variables used in this paper.

pone.0285212.t004 - A distributed computing model for big data anonymization...

Synthetic data generating parameters. The table summarizes the generating...

Classification performances for the Breast Cancer Wisconsin dataset....

ANOVA results for the distributions in Fig 8. Four different one-way ANOVA...

Classification performances for the rice dataset. Performance indices are...

F1-measure criterion on the anonymous poker hand for λ = 4, ḱ = 3 and...

Accuracy criterion on the anonymous poker hand for λ = 4, ḱ = 3 and...

Medical dataset in 3-diversity model.See More Versions

Medical dataset in 3-diversity model.