14 datasets found
  1. f

    Supplementary data.

    • plos.figshare.com
    zip
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou (2025). Supplementary data. [Dataset]. http://doi.org/10.1371/journal.pdig.0000735.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundAnonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.MethodsThe French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project’s objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).Results87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.ConclusionNone of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.

  2. h

    Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...

    • heidata.uni-heidelberg.de
    pdf, tsv, txt
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2
    Explore at:
    txt(3421), tsv(191831), tsv(106632), tsv(286102), tsv(107100), tsv(190296), tsv(197975), pdf(640128)Available download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    heiDATA
    Authors
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2

    Description

    In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

  3. w

    Global Video Anonymization Market Research Report: By Technology (Software,...

    • wiseguyreports.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Video Anonymization Market Research Report: By Technology (Software, Hardware, Cloud-based), By Deployment (On-premises, Cloud), By End User (Media and entertainment, Healthcare, Financial services, Government), By Anonymization Technique (Face blurring, Object redaction, Voice modulation, Background replacement) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/video-anonymization-market
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 8, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 2023617.59(USD Billion)
    MARKET SIZE 2024706.71(USD Billion)
    MARKET SIZE 20322077.2(USD Billion)
    SEGMENTS COVEREDTechnology ,Deployment ,End User ,Anonymization Technique ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 Growing demand for data privacy 2 Advancements in AI and facial recognition 3 Increase in video surveillance 4 Regulatory compliance 5 Expansion of cloudbased video anonymization solutions
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDMicrosoft ,Fourmilab ,Proofpoint ,LogRhythm ,SAS Institute ,FSecure ,Intermedia ,One Identity ,BeenVerified ,Oracle ,Image Scrubber ,IBM ,Splunk ,Axzon ,Digital Shadows
    MARKET FORECAST PERIOD2025 - 2032
    KEY MARKET OPPORTUNITIES1 Growing adoption of video surveillance systems 2 Increasing demand from law enforcement and security agencies 3 Rising concerns over data privacy and security 4 Government regulations and compliance requirements 5 Advancements in AI and machine learning technologies
    COMPOUND ANNUAL GROWTH RATE (CAGR) 14.43% (2025 - 2032)
  4. f

    S1 Data -

    • plos.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0285212.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  5. f

    Medical dataset in 3-diversity model.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). Medical dataset in 3-diversity model. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  6. Consensual videos of potentially re-identifiable individuals recorded at the...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivien Geenen; Vivien Geenen; Till Riedel; Till Riedel (2024). Consensual videos of potentially re-identifiable individuals recorded at the Autonomous Driving Test Area Baden-Württemberg (raw images recorded daytime). [Dataset]. http://doi.org/10.5281/zenodo.10020644
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vivien Geenen; Vivien Geenen; Till Riedel; Till Riedel
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Mar 30, 2023
    Area covered
    Baden-Württemberg
    Description

    For the purpose of research on data intermediaries and data anonymisation, it is necessary to test these processes with realistic video data containing personal data. For this purpose, the TreuMoDa project, funded by the German Federal Ministry of Education and Research (BMBF), has created a dataset of different traffic scenes containing identifiable persons.

    This video data was collected at the Autonomous Driving Test Area Baden-Württemberg. On the one hand, it should be possible to recognise people in traffic, including their line of sight. On the other hand, it should be usable for the demonstration and evaluation of anonymisation techniques.

    The legal basis for the publication of this data set the consent given by the participants as documented in the file Consent.pdf (all purposes) in accordance with Art. 6 1 (a) and Art. 9 2 (a) GDPR. Any further processing is subject to the GDPR.

    We make this dataset available for non-commercial purposes such as teaching, research and scientific communication. Please note that this licence is limited by the provisions of the GDPR. Anyone downloading this data will become an independent controller of the data. This data has been collected with the consent of the identifiable individuals depicted.

    Any consensual use must take into account the purposes mentioned in the uploaded consent forms and in the privacy terms and conditions provided to the participants (see Consent.pdf). All participants consented to all three purposes, and no consent was withdrawn at the time of publication. KIT is unable to provide you with contact details for any of the participants, as we have removed all links to personal data other than that contained in the published images.

  7. o

    Supplementary Material for "Investigating Software Development Teams...

    • explore.openaire.eu
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO (2024). Supplementary Material for "Investigating Software Development Teams Members' Perceptions of Data Privacy in the Use of Large Language Models (LLMs)" [Dataset]. http://doi.org/10.5281/zenodo.13138862
    Explore at:
    Dataset updated
    Jul 26, 2024
    Authors
    Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO
    Description

    ABSTRACT: Context: Large Language Models (LLMs) have revolutionized natural language generation and understanding. However, they raise significant data privacy concerns, especially when sensitive data is processed and stored by third parties. Goal: This paper investigates the perception of software development teams members regarding data privacy when using LLMs in their professional activities. Additionally, we examine the challenges faced and the practices adopted by these practitioners. Method: We conducted a survey with 78 ICT practitioners from five regions of the country. Results: Software development teams members have basic knowledge about data privacy and LGPD, but most have never received formal training on LLMs and possess only basic knowledge about them. Their main concerns include the leakage of sensitive data and the misuse of personal data. To mitigate risks, they avoid using sensitive data and implement anonymization techniques. The primary challenges practitioners face are ensuring transparency in the use of LLMs and minimizing data collection. Software development teams members consider current legislation inadequate for protecting data privacy in the context of LLM use. Conclusions: The results reveal a need to improve knowledge and practices related to data privacy in the context of LLM use. According to software development teams members, organizations need to invest in training, develop new tools, and adopt more robust policies to protect user data privacy. They advocate for a multifaceted approach that combines education, technology, and regulation to ensure the safe and responsible use of LLMs.

  8. f

    A sample medical dataset.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  9. Z

    Value creation stories anonymized open data set (Immunization Agenda 2030...

    • data.niaid.nih.gov
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadki, Reda (2024). Value creation stories anonymized open data set (Immunization Agenda 2030 Full Learning Cycle, 7 March - 20 June 2022) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7763921
    Explore at:
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    Charlotte Mbuh
    Alan Brooks
    Ana Paula Szylovec
    Sadki, Reda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title

    Immunization Agenda 2030 (IA2030) 1st Movement Full Learning Cycle (FLC 2022) – “How are you doing?” Value Creation Stories Survey (Version 1.0)

    Research audience

    Education researchers interested in the application of the “value creation stories” (VCS) conceptual framework elaborated by Etienne Wenger et al. in the study of communities of practice and other types of digital communities.

    Credits

    Author

    The Geneva Learning Foundation 18 avenue Louis Casaï CH-1209 Geneva, Switzerland research@learning.foundation

    Principal Investigator and corresponding author

    Reda Sadki, The Geneva Learning Foundation (TGLF) reda@learning.foundation

    Project partners

    Bridges to Development University of South Australia Centre for Change and Complexity in Learning (C3L)

    Roles and responsibilities

    • Design: The Geneva Learning Foundation
    • Implementation (sample collection): The Geneva Learning Foundation
    • Processing: The Geneva Learning Foundation, Bridges for Development, Centre for Complexity and Change in Learning (C3L)
    • Anonymization: The Geneva Learning Foundation and Bridges for Development
    • Data cleaning: Bridges to Development
    • Submission: The Geneva Learning Foundation

    Funding sources or sponsorship that supported the data collection

    Wellcome, Bill & Melinda Gates Foundation (BMGF)

    Recommended citation

    The Geneva Learning Foundation, 2023. Value Creation Stories (VCS) weekly feedback survey, 2022 Full Learning Cycle (FLC) of the Movement for Immunization Agenda 2030 (IA2030) (Version 1.0). [Data Set]. The Geneva Learning Foundation. DOI: 10.5281/zenodo.7763922

    Description of the sample

    File list:

    This file is IA2030_FLC_2022_Value_Creation_Stories.README.md

    IA2030-EN_FLC_2022_Value_Creation_Stories-questions_mapping.csv : List of the survey’s questions and their code in English as well as their unit. (21 questions) - Version 1: Geneva Learning Foundation, 31 March 2023.

    IA2030-EN_FLC_2022_Value_Creation_Stories.csv : Dataset Response of participants that replied in English. (n: 2101, obs:5601) - Version 1: Geneva Learning Foundation, 31 March 2023.

    IA2030-FR_FLC_2022_Value_Creation_Stories-questions_mapping.csv: List of the survey’s questions and their code in English as well as their unit. (21 questions) - Version 1: Geneva Learning Foundation, 31 March 2023.

    IA2030-FR_FLC_2022_Value_Creation_Stories-Google_translation.csv: Dataset Response of participants that replied in French translated to English using “Google Translate” (n: 1585, obs:4493) - Version 1: Geneva Learning Foundation, 31 March 2023.

    IA2030-FR_FLC_2022_Value_Creation_Stories.csv: Dataset Response of participants that replied in French (n: 1585, obs:4493) - Version 1: Geneva Learning Foundation, 31 March 2023. Relationship between files: The questions codes data set are the same code as the column variables and can be connected.

    Relationship between files

    The questions codes data set are the same code as the column variables and can be connected.

    Related data sets

    This is a subset of data collected by The Geneva Learning Foundation (TGLF) during the 1st IA2030 Full Learning Cycle (FLC). The complete data set is more comprehensive, and includes: demographic information (gender, country), health system information (respondent’s health system level), respondents’ analyses of challenges and priorities.

    Additional data sets for the first Full Learning Cycle (FLC) of the Movement for Immunization Agenda 2030 (IA2030) are available from The Geneva Learning Foundation (TGLF) Insights Unit insights@learning.foundation

    Other publicly accessible locations of the data

    The Geneva Learning Foundation publishes data sets in relation to its Immunization Agenda 2030 (IA2030) Movement learning programme in the Zenodo open repository community: https://zenodo.org/communities/ia2030/

    1. Purpose and Objectives

    Primary goal of the survey:

    This survey had two goals in the context of TGLF’s IA2030 Movement Full Learning Cycle programme (2022): 1. Provide an asynchronous mechanism for support between peers (participants) and from the TGLF team; and 2. collect and measure programme participants’ value creation stories (VCS) during the programme.

    Martin de Laat’s “value creation stories” (VCS) has been used primarily in small-scale, qualitative studies of communities of practice, online forums, and education activities.

    This data set includes both quantitative (Likert) and qualitative (open text) responses to the VCS questions, collected over a period of four months (7 March – 20 June 2022) from a cohort that began with 6,185 participants on the start date.

    2. Population and Sample

    The target population were participants of the Geneva Learning Foundation’s Movement for Immunization Agenda 2030 (IA2030) learning programme. The initial cohort admitted to the programme was 6,185 individuals from 99 countries. Only participants who were formally admitted to the programme received the invitation to complete the survey.

    Programme participants were free to choose if and when to report (self-selection), and their responses were not checked against any other measures (self-reporting).

    Languages: French and English

    3. Survey Design and Methods

    Data collection period: 7 March 2022 – 20 June 2022

    Between 7 March and 20 June 2023, participants in the Geneva Learning Foundation’s “Immunization Agenda 2030” (IA2030) Movement Full Learning Cycle (FLC) were asked to respond to a questionnaire titled “How are you doing?”.

    Participants received a personalized email with the request to share feedback about their experience during the week. The link to share feedback was also included in other reminder and information emails sent in response to participant needs.

    The first survey was launched on the 11 of March 2022 and the last at 17 of March 2022, totalizing 15 requests. Participants could answer the survey at any time and as many times that they wished.

    The group of 6,185 participants grew over the course of the Cycle, as additional participants were able to join the initiative throughout the four-month period.

    Software- or Instrument-specific information needed to interpret the data

    • Automated translation of French data was performed using Google Translate
    • Methods used for removing or anonymizing personal identifiers or sensitive information:
    • Unique identifier: Unique identifiers were anonymized using MD5 Hashing via the web site Miracle Salad.Unique identifiers can be used to identify respondents who may have answered the survey more than once, at different points in time. This approach provides a method to anonymize sensitive data using MD5 hashing.*Limitation: MD5 hashing is a one-way function; it is not possible to dehash the data and recover the original information.**
    • Macros developed in Excel to replace Country names in qualitative responses. (No country information were collected in this survey, but some respondents referred to their specific contexts in their responses.) The macro did not account for typos, in case any country information is found please contact: research@learning.foundation

    Data collection start and end dates:

    7 March 2023 until 20 June 2023

    Events or circumstances during data collection that may have influenced results:

    No requests for responses were sent during TGLF’s “Term break” between 16-30 April 2022.

    4. Data Processing and Cleaning

    • Incomplete or inconsistent responses: Not cleaned, as respondents were able to opt out of specific sections of survey or skip questions.
    • Data transformations or imputations: None
    • Treatment of outliers or extreme values: None

    5. Variables and Measures

    The survey included Likert scale questions and qualitative open texts based the conceptual framework for Value Creation Stories (VCS) developed by Wenger et. al. (2011). There are no derived or calculated variables. Items are Likert scale, multiple choice, and open text.

    6. Data Quality and Reliability

    All the responses done before or after the FLC period (7 March – 20 June 2022) were excluded of the sample.

    7. Data Privacy and Anonymization

    Methods used for removing or anonymizing personal identifiers or sensitive information:

    • Unique identifier: Unique identifiers were anonymized using MD5 Hashing via the web site https://www.miraclesalad.com/webtools/md5.php. Unique identifiers can be used to identify respondents who may have answered the survey more than once, at different points in time. This approach provides a method to anonymize sensitive data using MD5 hashing.
    • Limitation: MD5 hashing is a one-way function; it is not possible to dehash the data and recover the original information.
    • Macros developed in Excel to replace Country names in qualitative responses. (No country information were collected in this survey, but some respondents referred to their specific contexts in their responses.)

    8. Data Availability and Accessibility

    This data set is made available on Zenodo.org in the Zenodo community “Movement for Immunization Agenda 2030 (IA2030)” https://zenodo.org/communities/ia2030/

    Requests for additional information should be addressed to research@learning.foundation.

    This is a subset of data collected by The Geneva Learning Foundation (TGLF) during the 1st IA2030 Full Learning Cycle (FLC).

    The complete data set is more comprehensive, and includes: demographic information (gender, country), health system information (respondent’s health system level), respondents’ analyses of challenges and priorities.

    Other publicly accessible locations of the data

    The Geneva Learning Foundation publishes data sets in relation to its Immunization

  10. Spatially anonymized data for: Multistate Ornstein-Uhlenbeck approach for...

    • zenodo.org
    • datadryad.org
    csv
    Updated Jun 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Eisaguirre; Joseph Eisaguirre (2022). Spatially anonymized data for: Multistate Ornstein-Uhlenbeck approach for practical estimation of movement and resource selection around central places [Dataset]. http://doi.org/10.5061/dryad.8cz8w9gnz
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 3, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph Eisaguirre; Joseph Eisaguirre
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    1. Home range dynamics and movement are central to a species' ecology and strongly mediate both intra- and interspecific interactions. Numerous methods have been introduced to describe animal home ranges, but most lack predictive ability and cannot capture effects of dynamic environmental patterns, such as the impacts of air and water flow on movement.

    2. Here, we develop a practical, multi-stage approach for statistical inference into the behavioral mechanisms underlying how habitat and dynamic energy landscapes---in this case how airflow increases or decreases the energetic efficiency of flight---shape animal home ranges based around central places. We validated the new approach using simulations, then applied it to a sample of 12 adult golden eagles (Aquila chrysaetos) tracked with satellite telemetry.

    3. The application to golden eagles revealed effects of habitat variables that align with predicted behavioral ecology. Further, we found that males and females partition their home ranges dynamically based on uplift. Specifically, changes in wind and sun angle drove differential space use between sexes, especially later in the breeding season when energetic demands of growing nestlings require both parents to forage more widely.

    4. This method is easily implemented using widely available programming languages and is based on a hierarchical multistate Ornstein-Uhlenbeck space use process that incorporates habitat and energy landscapes. The underlying mathematical properties of the model allow straightforward computation of predicted utilization distributions, permitting estimation of home range size and visualization of space use patterns under varying conditions.

  11. F

    Data from: WEA-Acceptance Data: Wind Turbine Dataset Including Acoustical,...

    • data.uni-hannover.de
    .csv, json, parquet +2
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut für Statik und Dynamik (2024). WEA-Acceptance Data: Wind Turbine Dataset Including Acoustical, Meteorological and Turbine Parameters (Version 2.0) [Dataset]. https://data.uni-hannover.de/dataset/wea-acceptance_data_v1
    Explore at:
    parquet(243492), .csv(1897), parquet(90157), json(10357), pdf(1107041), parquet(89778), parquet(1485165), pdf(261496), parquet(1919763), parquet(1894583), parquet(1892269), pdf(1329610), zip(244097), zip, zip(169921)Available download formats
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Institut für Statik und Dynamik
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Within the project WEA-Acceptance¹, extensive measurement campaigns were carried out, which included the recording of acoustic, meteorological and turbine-specific data. Acoustic quantities were measured at several distances to the wind turbine and under various atmospheric and turbine conditions. In the project WEA-Acceptance-Data², the acquired measurements are stored in a structured and anonymized form and provided for research purposes. Besides the data and its documentation, first evaluations as well as reference data sets for chosen scenarios are published.

    In this version of the data platform, a specification 2.0, an anonymized data set and three use cases are published. The specification contains the concept of the data platform, which is primarily based on the FAIR (Findable, Accessible, Interoperable, and Reusable) principle. The data set consists of turbine-specific, meteorological and acoustic data recorded over one month. Herein, the data were corrected, conditioned and anonymized so that relevant outliers are marked and erroneous data are removed in the data set. The acoustic data includes anonymized sound pressure levels and one-third octave spectra averaged over ten minutes as well as audio data. In addition, the metadata and an overview of data availability are uploaded. As examples for the application of the data, three use cases are also published. Important information such as the approach for data anonymization is briefly described in the ReadMe file.

    For further information about the measurements, it is referred to "Martens, S., Bohne, T., and Rolfes, R.: An evaluation method for extensive wind turbine sound measurement data and its application, Proceedings of Meetings on Acoustics, Acoustical Society of America, 41, 040001, https://doi.org/10.1121/2.0001326, 2020.

    ¹The project WEA-Acceptance (FKZ 0324134A) was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi).

    ²The project WEA-Acceptance-Data (FKZ 03EE3062) was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi).

  12. 4

    Farmer Survey data (anonymized) for the thesis "Planning sustainable and...

    • data.4tu.nl
    zip
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Faiz Alam (2024). Farmer Survey data (anonymized) for the thesis "Planning sustainable and equitable agricultural water interventions: an agent based sociohydrology approach" [Dataset]. http://doi.org/10.4121/e5dc84d4-e22e-41c3-aa41-fbbe7ec74d83.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Mohammad Faiz Alam
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    2021
    Area covered
    Description

    This data pertains to the farmer's survey in the Kamadhiya catchment in Gujarat India. In December 2021, 492 farmers distributed across 24 villages were interviewed in the Kamadhiya catchment. The study sample was selected through a multistage random sampling procedure. First, 24 villages from a total of 88 villages lying within the Kamadhiya catchment were selected using regularly distributed sampling. Thereafter, in each village, 20-22 farmers were selected for the survey using proportionate random sampling. The survey questionnaire consisted of two parts, 1) farmers' socio-economic characteristics and 2) farmers' perception of check dam impacts and sociopsychological questions regarding the maintenance of CDs; 3) farmers' cropping and irrigation practices and 4) farmers' perception and adoption of AWM practices (drip and borewell). For more description and information on the study area, please refer to the below publication:

    Alam, M.F., McClain, M.E., Sikka, A., Daniel, D., Pande, S., 2022. Benefits, equity, and sustainability of community rainwater harvesting structures: An assessment based on farm scale social survey. Front. Environ. Sci. 10. https://doi.org/10.3389/fenvs.2022.1043896


  13. [Inequality, Competitiveness, and Motivaton - Studies 1-3] Material, raw...

    • figshare.com
    zip
    Updated Oct 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Sommet (2018). [Inequality, Competitiveness, and Motivaton - Studies 1-3] Material, raw data, working data,and working syntax (anonymized) [Dataset]. http://doi.org/10.6084/m9.figshare.4779640.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 2, 2018
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Nicolas Sommet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Questionnaires, raw data, and syntax files for the three studies of the following paper: Sommet, N., & Elliot. A. J., Jamieson, J. P., Butera, F. (2018). Income inequality, perceived competitiveness, and approach-avoidance motivation. To appear in Journal of Personality.

  14. f

    Anonymized raw dataset of the aymmetry analysis.

    • plos.figshare.com
    xlsx
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sang-Yoon Lee; Eun Kyoung Lee; Ki Ho Park; Dong Myung Kim; Jin Wook Jeoung (2023). Anonymized raw dataset of the aymmetry analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0164866.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sang-Yoon Lee; Eun Kyoung Lee; Ki Ho Park; Dong Myung Kim; Jin Wook Jeoung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This supplementary material provides the anonymized raw dataset of the asymmetry anaylsis. (XLSX)

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou (2025). Supplementary data. [Dataset]. http://doi.org/10.1371/journal.pdig.0000735.s001

Supplementary data.

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Feb 3, 2025
Dataset provided by
PLOS Digital Health
Authors
David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundAnonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.MethodsThe French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project’s objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).Results87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.ConclusionNone of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.

Search
Clear search
Close search
Google apps
Main menu