14 datasets found

f
Supplementary data.
plos.figshare.com
zip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou (2025). Supplementary data. [Dataset]. http://doi.org/10.1371/journal.pdig.0000735.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000735.s001
Dataset updated
Feb 3, 2025
Dataset provided by
PLOS Digital Health
Authors
David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundAnonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.MethodsThe French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project’s objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).Results87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.ConclusionNone of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.
h
Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...
heidata.uni-heidelberg.de
pdf, tsv, txt
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2
Explore at:
txt(3421), tsv(191831), tsv(106632), tsv(286102), tsv(107100), tsv(190296), tsv(197975), pdf(640128)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/MXM0Q2
Dataset updated
Nov 20, 2024
Dataset provided by
heiDATA
Authors
Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2
Description
In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

Global Video Anonymization Market Research Report: By Technology (Software,...

wiseguyreports.com

Updated Aug 10, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Video Anonymization Market Research Report: By Technology (Software, Hardware, Cloud-based), By Deployment (On-premises, Cloud), By End User (Media and entertainment, Healthcare, Financial services, Government), By Anonymization Technique (Face blurring, Object redaction, Voice modulation, Background replacement) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/video-anonymization-market

Explore at:

Dataset updated

Aug 10, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 8, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	617.59(USD Billion)
MARKET SIZE 2024	706.71(USD Billion)
MARKET SIZE 2032	2077.2(USD Billion)
SEGMENTS COVERED	Technology ,Deployment ,End User ,Anonymization Technique ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	1 Growing demand for data privacy 2 Advancements in AI and facial recognition 3 Increase in video surveillance 4 Regulatory compliance 5 Expansion of cloudbased video anonymization solutions
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Microsoft ,Fourmilab ,Proofpoint ,LogRhythm ,SAS Institute ,FSecure ,Intermedia ,One Identity ,BeenVerified ,Oracle ,Image Scrubber ,IBM ,Splunk ,Axzon ,Digital Shadows
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	1 Growing adoption of video surveillance systems 2 Increasing demand from law enforcement and security agencies 3 Rising concerns over data privacy and security 4 Government regulations and compliance requirements 5 Advancements in AI and machine learning technologies
COMPOUND ANNUAL GROWTH RATE (CAGR)	14.43% (2025 - 2032)

f
S1 Data -
plos.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0285212.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
f
Medical dataset in 3-diversity model.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
Consensual videos of potentially re-identifiable individuals recorded at the...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivien Geenen; Vivien Geenen; Till Riedel; Till Riedel (2024). Consensual videos of potentially re-identifiable individuals recorded at the Autonomous Driving Test Area Baden-Württemberg (raw images recorded daytime). [Dataset]. http://doi.org/10.5281/zenodo.10020644
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10020644
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vivien Geenen; Vivien Geenen; Till Riedel; Till Riedel
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Mar 30, 2023
Area covered
Baden-Württemberg
Description
For the purpose of research on data intermediaries and data anonymisation, it is necessary to test these processes with realistic video data containing personal data. For this purpose, the TreuMoDa project, funded by the German Federal Ministry of Education and Research (BMBF), has created a dataset of different traffic scenes containing identifiable persons.
This video data was collected at the Autonomous Driving Test Area Baden-Württemberg. On the one hand, it should be possible to recognise people in traffic, including their line of sight. On the other hand, it should be usable for the demonstration and evaluation of anonymisation techniques.
The legal basis for the publication of this data set the consent given by the participants as documented in the file Consent.pdf (all purposes) in accordance with Art. 6 1 (a) and Art. 9 2 (a) GDPR. Any further processing is subject to the GDPR.
We make this dataset available for non-commercial purposes such as teaching, research and scientific communication. Please note that this licence is limited by the provisions of the GDPR. Anyone downloading this data will become an independent controller of the data. This data has been collected with the consent of the identifiable individuals depicted.
Any consensual use must take into account the purposes mentioned in the uploaded consent forms and in the privacy terms and conditions provided to the participants (see Consent.pdf). All participants consented to all three purposes, and no consent was withdrawn at the time of publication. KIT is unable to provide you with contact details for any of the participants, as we have removed all links to personal data other than that contained in the published images.
o
Supplementary Material for "Investigating Software Development Teams...
explore.openaire.eu
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO (2024). Supplementary Material for "Investigating Software Development Teams Members' Perceptions of Data Privacy in the Use of Large Language Models (LLMs)" [Dataset]. http://doi.org/10.5281/zenodo.13138862
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13138862
Dataset updated
Jul 26, 2024
Authors
Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO
Description
ABSTRACT: Context: Large Language Models (LLMs) have revolutionized natural language generation and understanding. However, they raise significant data privacy concerns, especially when sensitive data is processed and stored by third parties. Goal: This paper investigates the perception of software development teams members regarding data privacy when using LLMs in their professional activities. Additionally, we examine the challenges faced and the practices adopted by these practitioners. Method: We conducted a survey with 78 ICT practitioners from five regions of the country. Results: Software development teams members have basic knowledge about data privacy and LGPD, but most have never received formal training on LLMs and possess only basic knowledge about them. Their main concerns include the leakage of sensitive data and the misuse of personal data. To mitigate risks, they avoid using sensitive data and implement anonymization techniques. The primary challenges practitioners face are ensuring transparency in the use of LLMs and minimizing data collection. Software development teams members consider current legislation inadequate for protecting data privacy in the context of LLM use. Conclusions: The results reveal a need to improve knowledge and practices related to data privacy in the context of LLM use. According to software development teams members, organizations need to invest in training, develop new tools, and adopt more robust policies to protect user data privacy. They advocate for a multifaceted approach that combines education, technology, and regulation to ensure the safe and responsible use of LLMs.
f
A sample medical dataset.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
Z
Value creation stories anonymized open data set (Immunization Agenda 2030...
data.niaid.nih.gov
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sadki, Reda (2024). Value creation stories anonymized open data set (Immunization Agenda 2030 Full Learning Cycle, 7 March - 20 June 2022) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7763921
Explore at:
Dataset updated
Mar 26, 2024
Dataset provided by
Charlotte Mbuh
Alan Brooks
Ana Paula Szylovec
Sadki, Reda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title

Immunization Agenda 2030 (IA2030) 1st Movement Full Learning Cycle (FLC 2022) – “How are you doing?” Value Creation Stories Survey (Version 1.0)

Research audience

Education researchers interested in the application of the “value creation stories” (VCS) conceptual framework elaborated by Etienne Wenger et al. in the study of communities of practice and other types of digital communities.

Credits

Author

The Geneva Learning Foundation 18 avenue Louis Casaï CH-1209 Geneva, Switzerland research@learning.foundation

Principal Investigator and corresponding author

Reda Sadki, The Geneva Learning Foundation (TGLF) reda@learning.foundation

Project partners

Bridges to Development University of South Australia Centre for Change and Complexity in Learning (C3L)

Roles and responsibilities

Design: The Geneva Learning Foundation

Implementation (sample collection): The Geneva Learning Foundation

Processing: The Geneva Learning Foundation, Bridges for Development, Centre for Complexity and Change in Learning (C3L)

Anonymization: The Geneva Learning Foundation and Bridges for Development

Data cleaning: Bridges to Development

Submission: The Geneva Learning Foundation

Funding sources or sponsorship that supported the data collection

Wellcome, Bill & Melinda Gates Foundation (BMGF)

Recommended citation

The Geneva Learning Foundation, 2023. Value Creation Stories (VCS) weekly feedback survey, 2022 Full Learning Cycle (FLC) of the Movement for Immunization Agenda 2030 (IA2030) (Version 1.0). [Data Set]. The Geneva Learning Foundation. DOI: 10.5281/zenodo.7763922

Description of the sample

File list:

This file is IA2030_FLC_2022_Value_Creation_Stories.README.md

IA2030-EN_FLC_2022_Value_Creation_Stories-questions_mapping.csv : List of the survey’s questions and their code in English as well as their unit. (21 questions) - Version 1: Geneva Learning Foundation, 31 March 2023.

IA2030-EN_FLC_2022_Value_Creation_Stories.csv : Dataset Response of participants that replied in English. (n: 2101, obs:5601) - Version 1: Geneva Learning Foundation, 31 March 2023.

IA2030-FR_FLC_2022_Value_Creation_Stories-questions_mapping.csv: List of the survey’s questions and their code in English as well as their unit. (21 questions) - Version 1: Geneva Learning Foundation, 31 March 2023.

IA2030-FR_FLC_2022_Value_Creation_Stories-Google_translation.csv: Dataset Response of participants that replied in French translated to English using “Google Translate” (n: 1585, obs:4493) - Version 1: Geneva Learning Foundation, 31 March 2023.

IA2030-FR_FLC_2022_Value_Creation_Stories.csv: Dataset Response of participants that replied in French (n: 1585, obs:4493) - Version 1: Geneva Learning Foundation, 31 March 2023. Relationship between files: The questions codes data set are the same code as the column variables and can be connected.

Relationship between files

The questions codes data set are the same code as the column variables and can be connected.

Related data sets

This is a subset of data collected by The Geneva Learning Foundation (TGLF) during the 1st IA2030 Full Learning Cycle (FLC). The complete data set is more comprehensive, and includes: demographic information (gender, country), health system information (respondent’s health system level), respondents’ analyses of challenges and priorities.

Additional data sets for the first Full Learning Cycle (FLC) of the Movement for Immunization Agenda 2030 (IA2030) are available from The Geneva Learning Foundation (TGLF) Insights Unit insights@learning.foundation

Other publicly accessible locations of the data

The Geneva Learning Foundation publishes data sets in relation to its Immunization Agenda 2030 (IA2030) Movement learning programme in the Zenodo open repository community: https://zenodo.org/communities/ia2030/

1. Purpose and Objectives

Primary goal of the survey:

This survey had two goals in the context of TGLF’s IA2030 Movement Full Learning Cycle programme (2022): 1. Provide an asynchronous mechanism for support between peers (participants) and from the TGLF team; and 2. collect and measure programme participants’ value creation stories (VCS) during the programme.

Martin de Laat’s “value creation stories” (VCS) has been used primarily in small-scale, qualitative studies of communities of practice, online forums, and education activities.

This data set includes both quantitative (Likert) and qualitative (open text) responses to the VCS questions, collected over a period of four months (7 March – 20 June 2022) from a cohort that began with 6,185 participants on the start date.

2. Population and Sample

The target population were participants of the Geneva Learning Foundation’s Movement for Immunization Agenda 2030 (IA2030) learning programme. The initial cohort admitted to the programme was 6,185 individuals from 99 countries. Only participants who were formally admitted to the programme received the invitation to complete the survey.

Programme participants were free to choose if and when to report (self-selection), and their responses were not checked against any other measures (self-reporting).

Languages: French and English

3. Survey Design and Methods

Data collection period: 7 March 2022 – 20 June 2022

Between 7 March and 20 June 2023, participants in the Geneva Learning Foundation’s “Immunization Agenda 2030” (IA2030) Movement Full Learning Cycle (FLC) were asked to respond to a questionnaire titled “How are you doing?”.

Participants received a personalized email with the request to share feedback about their experience during the week. The link to share feedback was also included in other reminder and information emails sent in response to participant needs.

The first survey was launched on the 11 of March 2022 and the last at 17 of March 2022, totalizing 15 requests. Participants could answer the survey at any time and as many times that they wished.

The group of 6,185 participants grew over the course of the Cycle, as additional participants were able to join the initiative throughout the four-month period.

Software- or Instrument-specific information needed to interpret the data

Automated translation of French data was performed using Google Translate

Methods used for removing or anonymizing personal identifiers or sensitive information:

Unique identifier: Unique identifiers were anonymized using MD5 Hashing via the web site Miracle Salad.Unique identifiers can be used to identify respondents who may have answered the survey more than once, at different points in time. This approach provides a method to anonymize sensitive data using MD5 hashing.*Limitation: MD5 hashing is a one-way function; it is not possible to dehash the data and recover the original information.**

Macros developed in Excel to replace Country names in qualitative responses. (No country information were collected in this survey, but some respondents referred to their specific contexts in their responses.) The macro did not account for typos, in case any country information is found please contact: research@learning.foundation

Data collection start and end dates:

7 March 2023 until 20 June 2023

Events or circumstances during data collection that may have influenced results:

No requests for responses were sent during TGLF’s “Term break” between 16-30 April 2022.

4. Data Processing and Cleaning

Incomplete or inconsistent responses: Not cleaned, as respondents were able to opt out of specific sections of survey or skip questions.

Data transformations or imputations: None

Treatment of outliers or extreme values: None

5. Variables and Measures

The survey included Likert scale questions and qualitative open texts based the conceptual framework for Value Creation Stories (VCS) developed by Wenger et. al. (2011). There are no derived or calculated variables. Items are Likert scale, multiple choice, and open text.

6. Data Quality and Reliability

All the responses done before or after the FLC period (7 March – 20 June 2022) were excluded of the sample.

7. Data Privacy and Anonymization

Methods used for removing or anonymizing personal identifiers or sensitive information:

Unique identifier: Unique identifiers were anonymized using MD5 Hashing via the web site https://www.miraclesalad.com/webtools/md5.php. Unique identifiers can be used to identify respondents who may have answered the survey more than once, at different points in time. This approach provides a method to anonymize sensitive data using MD5 hashing.

Limitation: MD5 hashing is a one-way function; it is not possible to dehash the data and recover the original information.

Macros developed in Excel to replace Country names in qualitative responses. (No country information were collected in this survey, but some respondents referred to their specific contexts in their responses.)

8. Data Availability and Accessibility

This data set is made available on Zenodo.org in the Zenodo community “Movement for Immunization Agenda 2030 (IA2030)” https://zenodo.org/communities/ia2030/

Requests for additional information should be addressed to research@learning.foundation.

This is a subset of data collected by The Geneva Learning Foundation (TGLF) during the 1st IA2030 Full Learning Cycle (FLC).

The complete data set is more comprehensive, and includes: demographic information (gender, country), health system information (respondent’s health system level), respondents’ analyses of challenges and priorities.

Other publicly accessible locations of the data

The Geneva Learning Foundation publishes data sets in relation to its Immunization
Spatially anonymized data for: Multistate Ornstein-Uhlenbeck approach for...
zenodo.org
datadryad.org
csv
Updated Jun 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Eisaguirre; Joseph Eisaguirre (2022). Spatially anonymized data for: Multistate Ornstein-Uhlenbeck approach for practical estimation of movement and resource selection around central places [Dataset]. http://doi.org/10.5061/dryad.8cz8w9gnz
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8cz8w9gnz
Dataset updated
Jun 3, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joseph Eisaguirre; Joseph Eisaguirre
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
1. Home range dynamics and movement are central to a species' ecology and strongly mediate both intra- and interspecific interactions. Numerous methods have been introduced to describe animal home ranges, but most lack predictive ability and cannot capture effects of dynamic environmental patterns, such as the impacts of air and water flow on movement.

2. Here, we develop a practical, multi-stage approach for statistical inference into the behavioral mechanisms underlying how habitat and dynamic energy landscapes---in this case how airflow increases or decreases the energetic efficiency of flight---shape animal home ranges based around central places. We validated the new approach using simulations, then applied it to a sample of 12 adult golden eagles (Aquila chrysaetos) tracked with satellite telemetry.

3. The application to golden eagles revealed effects of habitat variables that align with predicted behavioral ecology. Further, we found that males and females partition their home ranges dynamically based on uplift. Specifically, changes in wind and sun angle drove differential space use between sexes, especially later in the breeding season when energetic demands of growing nestlings require both parents to forage more widely.

4. This method is easily implemented using widely available programming languages and is based on a hierarchical multistate Ornstein-Uhlenbeck space use process that incorporates habitat and energy landscapes. The underlying mathematical properties of the model allow straightforward computation of predicted utilization distributions, permitting estimation of home range size and visualization of space use patterns under varying conditions.
F
Data from: WEA-Acceptance Data: Wind Turbine Dataset Including Acoustical,...
data.uni-hannover.de
.csv, json, parquet +2
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Statik und Dynamik (2024). WEA-Acceptance Data: Wind Turbine Dataset Including Acoustical, Meteorological and Turbine Parameters (Version 2.0) [Dataset]. https://data.uni-hannover.de/dataset/wea-acceptance_data_v1
Explore at:
parquet(243492), .csv(1897), parquet(90157), json(10357), pdf(1107041), parquet(89778), parquet(1485165), pdf(261496), parquet(1919763), parquet(1894583), parquet(1892269), pdf(1329610), zip(244097), zip, zip(169921)Available download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Statik und Dynamik
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Within the project WEA-Acceptance¹, extensive measurement campaigns were carried out, which included the recording of acoustic, meteorological and turbine-specific data. Acoustic quantities were measured at several distances to the wind turbine and under various atmospheric and turbine conditions. In the project WEA-Acceptance-Data², the acquired measurements are stored in a structured and anonymized form and provided for research purposes. Besides the data and its documentation, first evaluations as well as reference data sets for chosen scenarios are published.

In this version of the data platform, a specification 2.0, an anonymized data set and three use cases are published. The specification contains the concept of the data platform, which is primarily based on the FAIR (Findable, Accessible, Interoperable, and Reusable) principle. The data set consists of turbine-specific, meteorological and acoustic data recorded over one month. Herein, the data were corrected, conditioned and anonymized so that relevant outliers are marked and erroneous data are removed in the data set. The acoustic data includes anonymized sound pressure levels and one-third octave spectra averaged over ten minutes as well as audio data. In addition, the metadata and an overview of data availability are uploaded. As examples for the application of the data, three use cases are also published. Important information such as the approach for data anonymization is briefly described in the ReadMe file.

For further information about the measurements, it is referred to "Martens, S., Bohne, T., and Rolfes, R.: An evaluation method for extensive wind turbine sound measurement data and its application, Proceedings of Meetings on Acoustics, Acoustical Society of America, 41, 040001, https://doi.org/10.1121/2.0001326, 2020.

¹The project WEA-Acceptance (FKZ 0324134A) was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi).

²The project WEA-Acceptance-Data (FKZ 03EE3062) was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi).
4
Farmer Survey data (anonymized) for the thesis "Planning sustainable and...
data.4tu.nl
zip
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Faiz Alam (2024). Farmer Survey data (anonymized) for the thesis "Planning sustainable and equitable agricultural water interventions: an agent based sociohydrology approach" [Dataset]. http://doi.org/10.4121/e5dc84d4-e22e-41c3-aa41-fbbe7ec74d83.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/e5dc84d4-e22e-41c3-aa41-fbbe7ec74d83.v1
Dataset updated
Jun 7, 2024
Dataset provided by
4TU.ResearchData
Authors
Mohammad Faiz Alam
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
2021
Area covered

Description
This data pertains to the farmer's survey in the Kamadhiya catchment in Gujarat India. In December 2021, 492 farmers distributed across 24 villages were interviewed in the Kamadhiya catchment. The study sample was selected through a multistage random sampling procedure. First, 24 villages from a total of 88 villages lying within the Kamadhiya catchment were selected using regularly distributed sampling. Thereafter, in each village, 20-22 farmers were selected for the survey using proportionate random sampling. The survey questionnaire consisted of two parts, 1) farmers' socio-economic characteristics and 2) farmers' perception of check dam impacts and sociopsychological questions regarding the maintenance of CDs; 3) farmers' cropping and irrigation practices and 4) farmers' perception and adoption of AWM practices (drip and borewell). For more description and information on the study area, please refer to the below publication:
Alam, M.F., McClain, M.E., Sikka, A., Daniel, D., Pande, S., 2022. Benefits, equity, and sustainability of community rainwater harvesting structures: An assessment based on farm scale social survey. Front. Environ. Sci. 10. https://doi.org/10.3389/fenvs.2022.1043896
[Inequality, Competitiveness, and Motivaton - Studies 1-3] Material, raw...
figshare.com
zip
Updated Oct 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Sommet (2018). [Inequality, Competitiveness, and Motivaton - Studies 1-3] Material, raw data, working data,and working syntax (anonymized) [Dataset]. http://doi.org/10.6084/m9.figshare.4779640.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4779640.v1
Dataset updated
Oct 2, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nicolas Sommet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Questionnaires, raw data, and syntax files for the three studies of the following paper: Sommet, N., & Elliot. A. J., Jamieson, J. P., Butera, F. (2018). Income inequality, perceived competitiveness, and approach-avoidance motivation. To appear in Journal of Personality.
f
Anonymized raw dataset of the aymmetry analysis.
plos.figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sang-Yoon Lee; Eun Kyoung Lee; Ki Ho Park; Dong Myung Kim; Jin Wook Jeoung (2023). Anonymized raw dataset of the aymmetry analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0164866.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0164866.s001
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Sang-Yoon Lee; Eun Kyoung Lee; Ki Ho Park; Dong Myung Kim; Jin Wook Jeoung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This supplementary material provides the anonymized raw dataset of the asymmetry anaylsis. (XLSX)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou (2025). Supplementary data. [Dataset]. http://doi.org/10.1371/journal.pdig.0000735.s001

Supplementary data.

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pdig.0000735.s001

Dataset updated

Feb 3, 2025

Dataset provided by

PLOS Digital Health

Authors

David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundAnonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.MethodsThe French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project’s objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).Results87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.ConclusionNone of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.

Clear search

Close search

Google apps

Main menu

Supplementary data.

Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...

Global Video Anonymization Market Research Report: By Technology (Software,...

S1 Data -

Medical dataset in 3-diversity model.

Consensual videos of potentially re-identifiable individuals recorded at the...

Supplementary Material for "Investigating Software Development Teams...

A sample medical dataset.

Value creation stories anonymized open data set (Immunization Agenda 2030...

Title

Research audience

Credits

Author

Principal Investigator and corresponding author

Project partners

Roles and responsibilities

Funding sources or sponsorship that supported the data collection

Recommended citation

Description of the sample

File list:

Relationship between files

Related data sets

Other publicly accessible locations of the data

1. Purpose and Objectives

Primary goal of the survey:

2. Population and Sample

Languages: French and English

3. Survey Design and Methods

Software- or Instrument-specific information needed to interpret the data

Data collection start and end dates:

Events or circumstances during data collection that may have influenced results:

4. Data Processing and Cleaning

5. Variables and Measures

6. Data Quality and Reliability

7. Data Privacy and Anonymization

Methods used for removing or anonymizing personal identifiers or sensitive information:

8. Data Availability and Accessibility

Other publicly accessible locations of the data

Spatially anonymized data for: Multistate Ornstein-Uhlenbeck approach for...

Data from: WEA-Acceptance Data: Wind Turbine Dataset Including Acoustical,...

Farmer Survey data (anonymized) for the thesis "Planning sustainable and...

[Inequality, Competitiveness, and Motivaton - Studies 1-3] Material, raw...

Anonymized raw dataset of the aymmetry analysis.

Supplementary data.