26 datasets found
  1. D

    Data De-identification and Pseudonymity Software Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data De-identification and Pseudonymity Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-de-identification-and-pseudonymity-software-30730
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 9, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data De-identification and Pseudonymization Software market is experiencing robust growth, projected to reach $1941.6 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 7.3%. This expansion is driven by increasing regulatory compliance needs (like GDPR and CCPA), heightened concerns regarding data privacy and security breaches, and the burgeoning adoption of cloud-based solutions. The market is segmented by deployment (cloud-based and on-premises) and application (large enterprises and SMEs). Cloud-based solutions are gaining significant traction due to their scalability, cost-effectiveness, and ease of implementation, while large enterprises dominate the application segment due to their greater need for robust data protection strategies and larger budgets. Key market players include established tech giants like IBM and Informatica, alongside specialized providers such as Very Good Security and Anonomatic, indicating a dynamic competitive landscape with both established and emerging players vying for market share. Geographic expansion is also a key driver, with North America currently holding a significant market share, followed by Europe and Asia Pacific. The forecast period (2025-2033) anticipates continued growth fueled by advancements in artificial intelligence and machine learning for enhanced de-identification techniques, and the increasing demand for data anonymization across various sectors like healthcare, finance, and government. The restraining factors, while present, are not expected to significantly hinder the market’s overall growth trajectory. These limitations might include the complexity of implementing robust de-identification solutions, the potential for re-identification risks despite advanced techniques, and the ongoing evolution of privacy regulations necessitating continuous adaptation of software capabilities. However, ongoing innovation and technological advancements are anticipated to mitigate these challenges. The continuous development of more sophisticated algorithms and solutions addresses re-identification vulnerabilities, while proactive industry collaboration and regulatory guidance aim to streamline implementation processes, ultimately fostering continued market expansion. The increasing adoption of data anonymization across diverse sectors, coupled with the expanding global digital landscape and related data protection needs, suggests a positive outlook for sustained market growth throughout the forecast period.

  2. h

    Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...

    • heidata.uni-heidelberg.de
    pdf, tsv, txt
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2
    Explore at:
    tsv(197975), tsv(190296), tsv(191831), pdf(640128), tsv(107100), txt(3421), tsv(286102), tsv(106632)Available download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    heiDATA
    Authors
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2

    Description

    In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

  3. A sample medical dataset.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  4. D

    Data Masking Software Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Data Masking Software Report [Dataset]. https://www.archivemarketresearch.com/reports/data-masking-software-57502
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Masking Software market is experiencing robust growth, driven by increasing regulations around data privacy (like GDPR and CCPA), the expanding adoption of cloud computing, and the surging need for secure data sharing across organizations. The market size in 2025 is estimated at $2.5 billion, exhibiting a Compound Annual Growth Rate (CAGR) of 15% during the forecast period (2025-2033). This significant growth is fueled by several key factors, including the rising demand for data anonymization and pseudonymization techniques across various sectors like banking, healthcare, and retail. Companies are increasingly investing in data masking solutions to protect sensitive customer information during testing, development, and collaboration, thus mitigating the risk of data breaches and regulatory penalties. The diverse application segments, including Banking, Financial Services, and Insurance (BFSI), Healthcare and Life Sciences, and Retail and Ecommerce, contribute significantly to market expansion. Furthermore, the shift towards cloud-based solutions offers scalability and cost-effectiveness, further accelerating market adoption. The market segmentation reveals a strong preference for cloud-based solutions, driven by their inherent flexibility and ease of deployment. Within the application segments, the BFSI sector is currently leading due to stringent regulatory compliance needs and the large volume of sensitive customer data handled. However, growth in the healthcare and life sciences sector is expected to accelerate significantly as more institutions embrace digital transformation and the handling of patient data becomes increasingly regulated. Geographic growth is robust across North America and Europe, with Asia-Pacific showing significant potential for future expansion due to growing digitalization and increasing awareness of data security issues. While the market faces certain restraints such as the complexity of implementing data masking solutions and the high initial investment costs, the long-term benefits of robust data protection and compliance outweigh these challenges, driving consistent market expansion.

  5. Geospatial and Information Substitution and Anonymization Tool (GISA)

    • osti.gov
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geospatial and Information Substitution and Anonymization Tool (GISA) [Dataset]. https://www.osti.gov/biblio/1992880
    Explore at:
    Dataset updated
    Jul 31, 2023
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    National Energy Technology Laboratoryhttps://netl.doe.gov/
    Description

    The Geospatial and Information Substitution and Anonymization Tool (GISA) incorporates techniques for obfuscating identifiable information from point data or documents, while simultaneously maintaining chosen variables to enable future use and meaningful analysis. This approach promotes collaboration and data sharing while also reducing the risk of exposure to sensitive information. GISA can be used in a number of different ways, including the anonymization of point spatial data, batch replacement/removal of user-specified terms from file names and from within file content, and aid with the selection and redaction of images and terms based on recommendations using natural language processing. Version 1 of the tool, published here, has updated functionality and enhanced capabilities to the beta version published in 2023. Please see User Documentation for further information on capabilities, as well as a guide for how to download and use the tool. If there are any feedback you would like to provide for the tool, please reach out with your feedback to edxsupport@netl.doe.gov. Disclaimer: This project was funded by the United States Department of Energy, National Energy Technology Laboratory, in part, through a site support contract. Neither the United States Government nor any agency thereof, nor any of their employees, nor the support contractor, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. The Geospatial and Information Substitution and Anonymization Tool (GISA) was developed jointly through the U.S. DOE Office of Fossil Energy and Carbon Management’s EDX4CCS Project, in part, from the Bipartisan Infrastructure Law.

  6. C

    Cloud Data Desensitization Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Cloud Data Desensitization Report [Dataset]. https://www.marketresearchforecast.com/reports/cloud-data-desensitization-30079
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The cloud data desensitization market is experiencing robust growth, driven by increasing regulatory compliance needs (like GDPR and CCPA), the rising volume of sensitive data stored in the cloud, and the expanding adoption of cloud computing across diverse sectors. The market, estimated at $5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $15 billion by 2033. Key growth drivers include the escalating need to protect sensitive data from breaches and unauthorized access, particularly within healthcare (medical research data), finance (financial risk assessment), and government (government statistics). The cloud-based delivery model offers scalability and cost-effectiveness, further fueling market expansion. While strong security measures are integral to the success of this technology, challenges remain regarding the balance between data usability and robust security protocols. Integration complexities with existing infrastructure and the potential for unforeseen vulnerabilities represent key restraints. Market segmentation reveals a strong preference for cloud-based solutions, given their inherent flexibility and scalability. The application segments, medical research data, financial risk assessment, and government statistics, are currently leading the market, primarily due to the highly sensitive nature of the data involved. Leading vendors like Micro Focus, IBM, Thales, Google Cloud, and others are actively shaping the market landscape through continuous innovation and the introduction of advanced data masking and tokenization techniques. Regional analysis indicates strong growth in North America and Europe, driven by stringent data privacy regulations and a high concentration of organizations handling sensitive data. However, increasing adoption in the Asia-Pacific region, fueled by rapid digital transformation, is expected to significantly boost market growth in the coming years. The forecast period of 2025-2033 presents a significant opportunity for market expansion, driven by increased data security awareness and evolving technological advancements.

  7. f

    Data from: Summary of baseline characteristics.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou (2025). Summary of baseline characteristics. [Dataset]. http://doi.org/10.1371/journal.pdig.0000735.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    David Pau; Camille Bachot; Charles Monteil; Laetitia Vinet; Mathieu Boucher; Nadir Sella; Romain Jegou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundAnonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.MethodsThe French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project’s objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).Results87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.ConclusionNone of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.

  8. f

    pone.0285212.t004 - A distributed computing model for big data anonymization...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). pone.0285212.t004 - A distributed computing model for big data anonymization in the networks [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    pone.0285212.t004 - A distributed computing model for big data anonymization in the networks

  9. Consensual videos of potentially re-identifiable individuals recorded at the...

    • zenodo.org
    • data.niaid.nih.gov
    csv, pdf, txt, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivien Geenen; Till Riedel; Till Riedel; Vivien Geenen (2024). Consensual videos of potentially re-identifiable individuals recorded at the Autonomous Driving Test Area Baden-Württemberg (raw images with location and IMU data). [Dataset]. http://doi.org/10.5281/zenodo.7805961
    Explore at:
    csv, zip, txt, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vivien Geenen; Till Riedel; Till Riedel; Vivien Geenen
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Baden-Württemberg
    Description

    For the purpose of research on data intermediaries and data anonymisation, it is necessary to test these processes with realistic video data containing personal data. For this purpose, the Treumoda project, funded by the German Federal Ministry of Education and Research (BMBF), has created a dataset of different traffic scenes containing identifiable persons.

    This video data was collected at the Autonomous Driving Test Area Baden-Württemberg. On the one hand, it should be possible to recognise people in traffic, including their line of sight. On the other hand, it should be usable for the demonstration and evaluation of anonymisation techniques.

    The legal basis for the publication of this data set the consent given by the participants as documented in the file Consent.pdf (all purposes) in accordance with Art. 6 1 (a) and Art. 9 2 (a) GDPR. Any further processing is subject to the GDPR.

    We make this dataset available for non-commercial purposes such as teaching, research and scientific communication. Please note that this licence is limited by the provisions of the GDPR. Anyone downloading this data will become an independent controller of the data. This data has been collected with the consent of the identifiable individuals depicted.

    Any consensual use must take into account the purposes mentioned in the uploaded consent forms and in the privacy terms and conditions provided to the participants (see Consent.pdf). All participants consented to all three purposes, and no consent was withdrawn at the time of publication. KIT is unable to provide you with contact details for any of the participants, as we have removed all links to personal data other than that contained in the published images.

  10. The global Data Masking Market size is USD 18.43 billion in 2024 and will...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The global Data Masking Market size is USD 18.43 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 18.51% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/data-masking-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Data Masking Market size will be USD 18.43 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 18.51% from 2024 to 2031. Market Dynamics of Data Masking Market

    Key Drivers for Data Masking Market

    Increasing Data Breaches and Cybersecurity Threats- One of the main reasons for the Data Masking Market growth is the escalating frequency and sophistication of data breaches and cybersecurity threats that drive the demand for data masking solutions. By obfuscating sensitive information in non-production environments, data masking helps mitigate the risk of unauthorized access and data exposure, safeguarding organizations against potential security breaches and reputational damage.
    The compliance requirements for data privacy and protection drive masking are anticipated to drive the Data Masking market’s expansion in the years ahead.
    

    Key Restraints for Data Masking Market

    The compliance complexities hinder data masking implementation in regulated industries.
    The challenges in maintaining data usability while ensuring effective masking impact the market growth.
    

    Introduction of the Data Masking Market

    Data masking is the increasing emphasis on data privacy and regulatory compliance. With stringent data protection regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), organizations are under pressure to safeguard sensitive information from unauthorized access and disclosure. Data masking techniques enable organizations to anonymize or pseudonymize sensitive data while preserving its utility for testing, development, or analytics purposes. As the consequences of data breaches and non-compliance become more severe, businesses across industries are investing in data masking solutions to mitigate risks, maintain regulatory compliance, and protect their reputation, thus driving the growth of the data masking market.

  11. f

    Data from: S1 Data -

    • plos.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0285212.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  12. h

    pii-masking-43k

    • huggingface.co
    Updated Jul 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai4Privacy (2023). pii-masking-43k [Dataset]. http://doi.org/10.57967/hf/0824
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2023
    Dataset authored and provided by
    Ai4Privacy
    Description

    Purpose and Features

    The purpose of the model and dataset is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The model is a fine-tuned version of "Distilled BERT", a smaller and faster version of BERT. It was adapted for the task of token classification based on the largest to our knowledge open-source PII masking dataset, which we are releasing simultaneously. The model size is 62 million parameters. The… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-43k.

  13. w

    Global Video Anonymization Market Research Report: By Technology (Software,...

    • wiseguyreports.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Video Anonymization Market Research Report: By Technology (Software, Hardware, Cloud-based), By Deployment (On-premises, Cloud), By End User (Media and entertainment, Healthcare, Financial services, Government), By Anonymization Technique (Face blurring, Object redaction, Voice modulation, Background replacement) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/video-anonymization-market
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 8, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 2023617.59(USD Billion)
    MARKET SIZE 2024706.71(USD Billion)
    MARKET SIZE 20322077.2(USD Billion)
    SEGMENTS COVEREDTechnology ,Deployment ,End User ,Anonymization Technique ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 Growing demand for data privacy 2 Advancements in AI and facial recognition 3 Increase in video surveillance 4 Regulatory compliance 5 Expansion of cloudbased video anonymization solutions
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDMicrosoft ,Fourmilab ,Proofpoint ,LogRhythm ,SAS Institute ,FSecure ,Intermedia ,One Identity ,BeenVerified ,Oracle ,Image Scrubber ,IBM ,Splunk ,Axzon ,Digital Shadows
    MARKET FORECAST PERIOD2025 - 2032
    KEY MARKET OPPORTUNITIES1 Growing adoption of video surveillance systems 2 Increasing demand from law enforcement and security agencies 3 Rising concerns over data privacy and security 4 Government regulations and compliance requirements 5 Advancements in AI and machine learning technologies
    COMPOUND ANNUAL GROWTH RATE (CAGR) 14.43% (2025 - 2032)
  14. Trust in the government on the use of data for the StopCovid app in France...

    • statista.com
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Trust in the government on the use of data for the StopCovid app in France May 2020 [Dataset]. https://www.statista.com/statistics/1118467/stopcovid-app-trust-france/
    Explore at:
    Dataset updated
    Mar 10, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 7, 2020
    Area covered
    France
    Description

    The French express high expectations in terms of information in the event of the development of other similar applications, primarily on the anonymization of data (84 percent) and the methods of control, in particular by the user himself (81 percent).

    StopCovid is a project that is part of the state of health emergency linked to the coronavirus epidemic. This project would consist of a smartphone application intended to limit the spread of the virus by identifying the transmission chains through the collection of somewhat personal infomation of French app users. In general, French people were rather in favor of the app .

  15. Data from "Auditory tests for characterizing hearing deficits in listeners...

    • zenodo.org
    • data.niaid.nih.gov
    bin, pdf, zip
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raul Sanchez-Lopez; Raul Sanchez-Lopez; Michal Fereczkowski; Michal Fereczkowski; Mouhamad El-Haj-Ali; Mouhamad El-Haj-Ali; Federica Bianchi; Federica Bianchi; Oscar Cañete; Oscar Cañete; Mengfan Wu; Mengfan Wu; Tobias Neher; Tobias Neher; Torsten Dau; Torsten Dau; Sébastien Santurette; Sébastien Santurette (2024). Data from "Auditory tests for characterizing hearing deficits in listeners with various hearing abilities: The BEAR test battery" [Dataset]. http://doi.org/10.5281/zenodo.4923009
    Explore at:
    bin, pdf, zipAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Raul Sanchez-Lopez; Raul Sanchez-Lopez; Michal Fereczkowski; Michal Fereczkowski; Mouhamad El-Haj-Ali; Mouhamad El-Haj-Ali; Federica Bianchi; Federica Bianchi; Oscar Cañete; Oscar Cañete; Mengfan Wu; Mengfan Wu; Tobias Neher; Tobias Neher; Torsten Dau; Torsten Dau; Sébastien Santurette; Sébastien Santurette
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains raw and processed data used and described in:

    R. Sanchez-Lopez, S.G. Nielsen, M. El-Haj-Ali, F. Bianchi, M, Fereckzowski, O. Cañete, M. Wu, T. Neher, T. Dau and S. Santurette (under review). ``Auditory tests for characterizing hearing deficits in listeners with various hearing abilities: The BEAR test battery,''. submitted to Frontiers in Neuroscience

    [Preprint available in medRxiv:
    https://doi.org/10.1101/2020.02.17.20021949]

    One aim of the Better hEAring Rehabilitation (BEAR) project is to define a new clinical profiling tool, a test-battery, for individualized hearing loss characterization. Whereas the loss of sensitivity can be efficiently assessed by pure-tone audiometry, it still remains a challenge to address supra-threshold hearing deficits using appropriate clinical diagnostic tools. In contrast to the classical attenuation-distortion model (Plomp, 1986), the proposed BEAR approach is based on the hypothesis that any listener’s hearing can be characterized along two dimensions reflecting largely independent types of perceptual distortions. Recently, a data-driven approach (Sanchez-Lopez et al., 2018) provided evidence consistent with the existence of two independent sources of distortion, and thus different auditory profiles. Eleven tests were selected for the clinical test battery, based on their feasibility, time efficiency and related evidence from the literature. The proposed tests were divided into five categories: audibility, speech perception, binaural-processing abilities, loudness perception, and spectro-temporal resolution. Seventy-five listeners with symmetric, mild-to-severe sensorineural hearing loss were selected from a clinical population of hearing-aid users. The participants completed all tests in a clinical environment and did not receive systematic training for any of the tasks. The analysis of the results focused on the ability of each test to pinpoint individual differences among the participants, relationships among the different tests, and determining their potential use in clinical settings. The results might be valuable for hearing-aid fitting and clinical auditory profiling.

    Please cite this article when using the data

    The Dataset BEAR3 has also been used in:

    Sanchez-Lopez R, Fereczkowski M, Neher T, Santurette S, Dau T. Robust Data-Driven Auditory Profiling Towards Precision Audiology. Trends in Hearing. January 2020. doi:10.1177/2331216520973539

    Sanchez-Lopez, R., Fereczkowski, M., Neher, T., Santurette, S., & Dau, T. (2020). Robust auditory profiling: Improved data-driven method and profile definitions for better hearing rehabilitation. Proceedings of the International Symposium on Auditory and Audiological Research, 7, 281-288. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2019-32

    and

    Sanchez Lopez, R., Nielsen, S. G., Cañete, O., Fereczkowski, M., Wu, M., Neher, T., Dau, T., & Santurette, S. (2019). A clinical test battery for Better hEAring Rehabilitation (BEAR): Towards the prediction of individual auditory deficits and hearing-aid benefit. In Proceedings of the 23rd International Congress on Acoustics (pp. 3841-3848). Deutsche Gesellschaft für Akustik e.V.. https://doi.org/10.18154/RWTH-CONV-239177

    Description of the files:

    • BEAR2.xlsx: Anonymized raw data obtained using the BEAR test battery.
    • BEAR2_YNH.xlsx: Additional anonymized raw data obtained using the BEAR test battery with young normal-hearing listeners.
    • BEAR3.xlsx: Anonymized processed data for statistical data analysis.
    • BEAR3_Results_AProfiling.xlsx: BEAR3 dataset including the profiles, probabilities to belong to each of the four profiles and estimated degree of Distortion type-I and Distortion type-II.
    • BEAR_Reliability.xlsx: Anonymized raw data similar to BEAR2 for the reliability study.
    • DataParticipants.xlsx: Anonymized basic data associated with the participants: Gender, Age, PTA, etc.
    • TestBatteryMethods_v1.1.pdf: Documentation of the test methods. Protocol included and corrections.
    • Reliability_v1.0.pdf: Detailed explanation about the test-retest reliability study carried out with a subset of the participants.

    * The participant IDs in each of the files has been assigned randomly to ensure the anonymization of the data. The pseudo-anonymized data might be shared under request by direct correspondence with the authors.

  16. g

    Schooling data from the University of Paris 13

    • gimi9.com
    • data.europa.eu
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schooling data from the University of Paris 13 [Dataset]. https://gimi9.com/dataset/eu_58e34f7dc751df5d2777388c
    Explore at:
    Description

    This is a dataset updated annually the description below relates to the first year of online release, since updates have taken place in 2018 (data 2008-2017) and 2019 (data 2009-2018). Paris 13 University recorded data on student registration in its information system (Apogee software) for each academic year between 2006(-2007) and 2015(-2016). These data relate to the diplomas prepared, the steps to achieve this, the scheme (if it concerns initial training or apprenticeship), the relevant components (UFR, IUT, etc.), and the origin of students (type of baccalaureate, academy of origin, nationality). Each entry concerns the main enrollment of a student at the university for a year. The attributes of this data are as follows. — CODE_INDIVIDU Hidden Data — ANNEE_INSCRIPTION Year of registration:2006 for 2006-2007, etc. — LIB_DIPLOME Diploma Name — LEVEAU_DANS_LE_DIPLOME 1, 2,... for master 1, license 2, etc. — LEVEAU_APRES_BAC 1, 2,... for Bac+ 1, Bac+ 2,... — LIBELLE_DISCIPLINE_DIPLOME Attachment of the diploma to a discipline — CODE_SISE_DIPLOME Student Tracking Information System Code — CODE_ETAPE Internal code of a stage (year, course) of diploma — LIBELLE_COURT_ETAPE Short name of step — LIBELLE_LONG_ETAPE More intelligible name of the step — LIBELLE_COURT_COMPOSANT Name of component (UFR, IUT etc.) — CODE_COMPOSANT Number code of component (unused) — REGROUPEMENT_BAC Type of Bac (L, ES, S, techno STMG, techno ST2S,...) — LIBELLE_ACADEMIE_BAC Academy of Bac (Creteil, Versailles, foreigner,...) — Continent Deduced of nationality which is masked data — LIBELLE_REGIME Initial training, continuing, pro, learning Paris 13 University publishes part of this dataset through several resources, while respecting the anonymity of its students. Starting from 213,289 entries that correspond to all enrolments of the 106,088 individuals who studied at Paris 13 University during the ten academic years between 2006(2007) and 2015(-2016), we selected several resources each corresponding to a part of the data. To produce each resource we chose a small number of attributes, then removed a small proportion of the inputs, in order to satisfy a k-anonymisation constraint with k = 5, i.e. to ensure that, in each resource, each entry appears at least 5 times identical (otherwise the input is deleted). The four resources produced are materialised by the following files. — The file ‘up13_etapes.csv’ concerns the diploma steps, it contains the attributes “CODE_ETAPE”, “LIBELLE_COURT_ETAPE”, “LIBELLE_LONG_ETAPE”, “NIVEAU_APRES_BAC”, “LIBELLE_COURT_COMPOSANTE”, “LIBELLE_DISCIPLINE_DIPLOME”, “CODE_SISE_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes a loss of 918 entries. — The file ‘up13_Academie.csv’ concerns the Bac Academy and it contains the attributes “LIBELLE_ACADEMIE_BAC”, “NIVEAU_APRES_BAC”, “NIVEAU_DANS_DIPLOME”, “CONTINENT”, “LIBELLE_REGIME”, “LIB_DIPLOME”, “LIBELLE_COURT_COMPOSANTE” and its anoymisation causes the loss of 7525 entries. — The file ‘up13_Bac.csv’ concerns the type of Bac and the level reached after the Bac, it contains the columns “REGROUPEMENT_BAC”, “NIVEAU_APRES_BAC”, “LIBELLE_REGIME”, “CONTINENT”, “LIBELLE_COURT_COMPOSANTE”, “LIB_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes the loss of 3,933 entries. — The file ‘up13_annees_etapes.csv’ concerns enrolment in the diploma stages year after year, it contains the columns “ANNEE_INSCRIPTION”, “LIBELLE_COURT_COMPOSANTE”, “NIVEAU_APRES_BAC”, “LIB_DIPLOME”, “CODE_ETAPE” and its anonymisation causes the loss of 3,532 entries. Other tables extracted from the same initial data and constructed using the same method of anonymisation can be provided on request (specify the desired columns). A second set of resources offers the follow-up of students year after year, from degree stage to degree stage. In this dataset, we call trace such tracking when the registration year has been forgotten and only the sequence remains. And we call cursus a data describing this succession of steps over the years. For anonymisation we have grouped the traces or the same paths and as soon as there were less than 10 we do not indicate their number, or, what amounts to the same, we put this number to 1 (the information being that there is at least one student who left this trace or followed this course). This leads to forgetting a number of too specific study paths and keeping only one as a witness. Starting from 106,088 trails or tracks, we produce the following resources. — The file ‘up13_traces.csv’ contains the sequence of diploma step codes (a trace) and anonymisation makes us forget 10 089 traces. — The file ‘up13_traces_wt_etape.csv’ contains similar traces, but without the step code. That is to say, only the diploma, the level after baccalaureate and the component concerned remain. Anonymisation makes us forget 4,447 traces. — The file ‘up13_traces_bac_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ but also with the Bac type. Anonymisation makes us forget 8,067 traces. — The file ‘up13_cursus_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ with the additional registration years. Anonymisation makes us forget 8,324 courses.

  17. i

    CRAWDAD umd/sigcomm2008

    • ieee-dataport.org
    Updated Mar 25, 2009
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Schulman (2009). CRAWDAD umd/sigcomm2008 [Dataset]. https://ieee-dataport.org/open-access/crawdad-umdsigcomm2008
    Explore at:
    Dataset updated
    Mar 25, 2009
    Dataset provided by
    IEEE Dataport
    Authors
    Aaron Schulman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We collected a trace of wireless network activity at SIGCOMM 2008. The subjects of the traced network chose to participate by joining the traced SSID. The release contains 3 types of anonymized traces: 802.11a, Ethernet and Syslog from the Access Point. We anonymized the trace data using a modified version (http://www.cs.umd.edu/projects/wifidelity/sigcomm08_traces/sigcomm08-tcpmkpub.tar.gz) of the tcpmkpub tool (http://www.icir.org/enterprise-tracing/tcpmkpub.html) The packet traces include anonymized DHCP and DNS headers.last modified : 2009-03-25release date : 2009-03-02date/time of measurement start : 2008-08-17date/time of measurement end : 2008-08-21collection environment : We collected a trace of wireless network activity at SIGCOMM 2008. The subjects of the traced network chose to participate by joining the traced SSID. Our goal is to gather a detailed trace of network activity at SIGCOMM 2008 to improve 802.11 tracing techniques as part of the Wifidelity project and enable analysis of the behavior of a wireless LAN that is (presumably) heavily used.network configuration : We used four BSSIDs on four channels with one NAT (Network Address Translation) router. To collect the traces, we deployed eight 802.11a monitors so 2 monitors are assigned to each channel. A Xirrus Wi-Fi Array (http://www.xirrus.com/products/arrays-80211abg.php) provided the traced 802.11a network (SSID:SIGCOMM-ONLY-Traced). The WiFi Array consisted of four BSSIDs that were broadcast on four 802.11a channels. After anonymization, the DHCP assigned IP addresses for clients are in the following subnets: 26.12.0.0/16 and 26.2.0.0/16.data collection methodology : We recorded network protocol information from all wired and wireless packets sent on the wireless network of SSID:SIGCOMM-ONLY-Traced. Each packet includes physical layer information (in the Prism header) such as the wireless signal strength as well as the 802.11, IP, TCP, UDP, and ICMP headers, depending on the packet type. We did not record packet payloads above the transport layer except for DHCP and DNS payloads. However, we anonymized or deleted potentially sensitive information such as MAC and IP addresses, and DHCP and DNS headers.sanitization : The user chose to participate in the trace by associating with the SIGCOMM-ONLY-Traced SSID. Otherwise, the users joined the "Untraced" SSID: SIGCOMM-ONLY-Untraced. The traces do not contain any data from the "Untraced" SSID. We anonymized the traces to protect the identity and activity of users who opted to be traced during SIGCOMM 2008. - Filtering 802.11a traces Each packet in the wireless traces meets one or both of the following criteria: 1. BSSID address matches the "traced" BSSID. 2. Packet is a probe request for the "SIGCOMM-ONLY-Traced" SSID. - Filtering Ethernet traces The AP was set up with a monitor VLAN for the "SIGCOMM-ONLY-Traced" network. - Filtering Syslog traces The syslog trace only contains information about users associated with the "traced" network. The method to filter out syslog messages about "Untraced" users is as follows: Include all syslog messages while a client is associated to the "traced" network. The syslog messages indicate when a client associates to, and disassociates from the "traced" network.Tracesetumd/sigcomm2008/pcapPCAP traceset of wireless network measurement in SIGCOMM 2008 conference.file: sigcomm08_traces.tar.gzdescription: We collected pcap traces of wireless network activity at SIGCOMM 2008. The subjects of the traced network chose to participate by joining the traced SSID.measurement purpose: Network Diagnosismethodology: 1. 802.11a During most of the conference approximately two 802.11a monitors were placed at the four corners of the main conference hall. We did not record the exact location of each monitor. However, we tried to capture each channel with two monitors placed at opposite corners of the room. 2. Ethernet Packets sent from the NAT to the AP and from the AP to the NAT were captured using an Ethernet trace collector attached to the packet dump port on the WiFi Array.sanitization: The packets are anonymized using a modified version of the tcpmkpub tool. The tool is available from the download link of [sigcomm08-tcpmkpub.tar.gz]. Metadata about the trace anonymization is provided in the file tcpmkpub.log.export. In the description below, [new] indicates new functionality added to tcpmkpub, and [tcpmkpub] indicates the functionality of the original tcpmkpub tool, described in the following reference: R. Pang, M. Allman, V. Paxson, and J. Lee. The Devil and Packet Trace Anonymization SIGCOMM Computer Communication Review, 2006. [Crypto-PAn] indicates the functionality of the original tcpmkpub tool, described in the following reference: Xu, J. Fan, M. H. Ammar, and S. B. Moon. Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme. In Proceedings of the IEEE International Conference on Network Protocols (ICNP), pages 280–289, Nov. 2002. 1. Checksums (IP/UDP/TCP) [tcpmkpub] The anonymization code recomputes checksums. The anonymization meta-data (tcpmkpub.log.export) holds information about packets in the traces with bad checksums. Bad checksums are indicated in the anonymized traces by a 1 in the checksum field, or 2 if the checksum was 1, A UDP checksum of 0 is not changed. 2. Link Layer A. Ethernet [tcpmkpub] MAC Addresses: - The 3 high and low-order bytes are hashed separately. - The high-order 3 bytes are hashed to retain vendor information. - Addresses containing all 1's or all 0's are not changed. - The Multicast bit is retained. B.VLAN [new] The vlan header did not need to be anonymized. C. 802.11 [new] - MAC addresses are anonymized using the same method as the Ethernet MAC addresses. - If the packet is fragmented (fragment bit == 1 or fragment # > 0), skip the rest of the packet. 3. Network Layer A. IP [tcpmkpub] - External addresses hashed using prefix preserving scheme [Crypto-PAn]. - Internal addresses hashed to unused prefix by the external addresses and the subnet and host portions of the address are transformed. - Multicast addresses are not anonymized. - The [tcpmkpub] paper recommends removing packets from network scanners. We did not determine this was a threat to our network as the identity tied to a local address was dynamic. B. ARP [tcpmkpub] - If the ARP packet contains a partial IP packet, use the IP anonymization above. - IP addresses anonymized using the IP anonymization procedure above. 4. Transport Layer A. TCP [tcpmkpub] - The TCP timestamp options are transformed into separate monotonically increasing counters with no relationship to time for each IP address in the anonymized trace. - If timestamp is 0 do not modify it. - Replace timestamp with a unique number incremented in the order of the trace. B. UDP [tcpmkpub] Recompute checksum according to checksum policy above. 5. Application Layer A. DNS [new] - Anonymize DNS labels individually by taking the Keyed-HMAC of the label. - Keep the low-order 8 bytes of the hash digest as the label. - Convert the digest to ASCII by converting to hex. - Store the new length of the DNS packet in the following fields: [IP/UDP/DNS,PCAP Captured, PCAP On Wire]. - Anonymize any type 'A' resource record data using the IP anonymization scheme above. DNS Packets may be cut off because of the snaplen at capture. B. DHCP [new] - Client IP address is anonymized. - Client hardware address is anonymized. - Your IP address (yiaddr) is anonymized. The rest of the DHCP packets were cut off by the snaplen at capture.umd/sigcomm2008/pcap Traces802.11a: PCAP traces of wireless network measurement collected from the wireless side in SIGCOMM 2008 conference.configuration: During most of the conference approximately two 802.11a monitors were placed at the four corners of the main conference hall. We did not record the exact location of each monitor. However, we tried to capture each channel with two monitors placed at opposite corners of the room. The network topology is configured as follows: Users: 26.12.*.* 26.2.*.* Network Management: 26.6.*.*format:sigcomm08_wl_(monitor #)_(first packet time)_(last packet time)_(bssid)_(channel).pcapEthernet: PCAP traces of wireless network measurement collected from the Ethernet side in the SIGCOMM 2008 conference.configuration: Packets sent from the NAT to the AP and from the AP to the NAT were captured using an Ethernet trace collector attached to the packet dump port on the WiFi Array. The network topology is configured as follows: Users: 26.12.*.* 26.2.*.* Network Management: 26.6.*.*format:sigcomm08_eth_(first packet time)_(last packet time).pcapanonymization_log: The anonymization log of wireless network traces in the SIGCOMM 2008 conference.configuration: tcpmkpub anonymization log for the traces 'umd/sigcomm2008/pcap/802.11a' and 'umd/sigcomm2008/pcap/Ethernet', and md5 checksums for the trace files.format:The anonymization log file name is 'tcpmkpub.log.export'.umd/sigcomm2008/syslogSyslog traceset of wireless network measurement in the SIGCOMM 2008 conference.file: sigcomm08_syslog.tar.gzdescription: We collected syslog traces of wireless network activity at SIGCOMM 2008. The subjects of the traced network chose to participate by joining the traced SSID.measurement purpose: Network Diagnosismethodology: A tracing box connected to the Array's management port collected syslog traces. Unfortunately, after the conference we noticed that these traces were corrupted. However, we were able to salvage one of the syslog traces because we collected it with the Ethernet tracing box.sanitization: macmkpub, a MAC address anonymizer based on the tcpmkpub anonymization code, anonymized the MAC addresses in the syslog traces. Metadata about the trace anonymization is provided in the file 'tcpmkpub.log.export'.umd/sigcomm2008/syslog TracesEthernet: Syslog traces of wireless network measurement in the SIGCOMM 2008

  18. d

    Updated PTSS dataset for the FORAS project - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Updated PTSS dataset for the FORAS project - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/76e82de6-ce29-5f10-8df7-a0fe1a17c489
    Explore at:
    Dataset updated
    Feb 5, 2025
    Description

    This updated labeled dataset builds upon the initial systematic review by van de Schoot et al. (2018; DOI: 10.1080/00273171.2017.1412293), which included studies on post-traumatic stress symptom (PTSS) trajectories up to 2016, sourced from the Open Science Framework (OSF). As part of the FORAS project - Framework for PTSS trajectORies: Analysis and Synthesis (funded by the Dutch Research Council, grant no. 406.22.GO.048 and pre-registered at PROSPERO under ID CRD42023494027), we extended this dataset to include publications between 2016 and 2023. In total, the search identified 10,594 de-duplicated records obtained via different search methods, each published with their own search query and result: Exact replication of the initial search: OSF.IO/QABW3 Comprehensive database search: OSF.IO/D3UV5 Snowballing: OSF.IO/M32TS Full-text search via Dimensions data: OSF.IO/7EXC5 Semantic search via OpenAlex: OSF.IO/M32TS Humans (BC, RN) and AI (Bron et al., 2024) have screened the records, and disagreements have been solved (MvZ, BG, RvdS). Each record was screened separately for Title, Abstract, and Full-text inclusion and per inclusion criteria. A detailed screening logbook is available at OSF.IO/B9GD3, and the entire process is described in https://doi.org/10.31234/osf.io/p4xm5. A description of all columns/variables and full methodological details is available in the accompanying codebook. Important Notes: Duplicates: To maintain consistency and transparency, duplicates are left in the dataset and are labeled with the same classification as the original records. A filter is provided to allow users to exclude these duplicates as needed. Anonymized Data: The dataset "...._anonymous" excludes DOIs, OpenAlex IDs, titles, and abstracts to ensure data anonymization during the review process. The complete dataset, including all identifiers, is uploaded under embargo and will be publicly available on 01-10-2025. This dataset serves not only as a valuable resource for researchers interested in systematic reviews of PTSS trajectories and facilitates reproducibility and transparency in the research process but also for data scientists who would like to mimic the screening process using different machine learning and AI models.

  19. Z

    Data from: Dichoptic metacontrast masking functions to infer transmission...

    • data.niaid.nih.gov
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krämer, Julia (2022). Data from: Dichoptic metacontrast masking functions to infer transmission delay in optic neuritis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4933040
    Explore at:
    Dataset updated
    May 28, 2022
    Dataset provided by
    Wiendl, Heinz
    Bruchmann, Maximilian
    Korsukewitz, Catharina
    Krämer, Julia
    Meuth, Sven G.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Optic neuritis (ON) has detrimental effects on the transmission of neuronal signals generated at the earliest stages of visual information processing. The amount, as well as the speed of transmitted visual signals is impaired. Measurements of visual evoked potentials (VEP) are often implemented in clinical routine. However, the specificity of VEPs is limited because multiple cortical areas are involved in the generation of P1 potentials, including feedback signals from higher cortical areas. Here, we show that dichoptic metacontrast masking can be used to estimate the temporal delay caused by ON. A group of 15 patients with unilateral ON, nine of which had sufficient visual acuity and volunteered to participate, and a group of healthy control subjects (N = 8) were presented with flashes of gray disks to one eye and flashes of gray annuli to the corresponding retinal location of the other eye. By asking subjects to report the subjective visibility of the target (i.e. the disk) while varying the stimulus onset asynchrony (SOA) between disk and annulus, we obtained typical U-shaped masking functions. From these functions we inferred the critical SOAmax at which the mask (i.e. the annulus) optimally suppressed the visibility of the target. ON-associated transmission delay was estimated by comparing the SOAmax between conditions in which the disk had been presented to the affected and the mask to the other eye, and vice versa. SOAmax differed on average by 28 ms, suggesting a reduction in transmission speed in the affected eye. Compared to previously reported methods assessing perceptual consequences of altered neuronal transmission speed the presented method is more accurate as it is not limited by the observers' ability to judge subtle variations in perceived synchrony.

  20. n

    Annual Agricultural Sample Survey 2022/23 - Tanzania

    • microdata.nbs.go.tz
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2024). Annual Agricultural Sample Survey 2022/23 - Tanzania [Dataset]. https://microdata.nbs.go.tz/index.php/catalog/52
    Explore at:
    Dataset updated
    Nov 16, 2024
    Dataset provided by
    Office of the Chief Government Statistician
    National Bureau of Statistics
    Time period covered
    2023 - 2024
    Area covered
    Tanzania
    Description

    Abstract

    The Annual Agricultural Sample Survey (AASS) for the year 2022/23 aimed to enhance the understanding of agricultural activities across Tanzania by collecting comprehensive data on various aspects of the agricultural sector. This survey is crucial for policy formulation, development planning, and service delivery, providing reliable data to monitor and evaluate national and international development frameworks.

    The 2022/23 survey is particularly significant as it informs the monitoring and evaluation of key agricultural development strategies and frameworks. The collected data will contribute to the Tanzania Development Vision 2025, Zanzibar Development Vision 2020, the Five-Year Development Plan 2021/22–2025/26, the National Strategy for Growth and Reduction of Poverty (NSGRP) known as MKUKUTA, and the Zanzibar Strategy for Growth and Reduction of Poverty (ZSGRP) known as MKUZA. The survey data also supports the evaluation of Sustainable Development Goals (SDGs) and Comprehensive Africa Agriculture Development Programme (CAADP). Key indicators for agricultural performance and poverty monitoring are directly measured from the survey data.

    The 2022/23 AASS provides a detailed descriptive analysis and related tables on the main thematic areas. These areas include household members and holder identification, field roster, seasonal plot and crop rosters (Vuli, Masika, and Dry Season), permanent crop production, crop harvest use, seed and seedling acquisition, input use and acquisition (fertilizers and pesticides), livestock inventory and changes, livestock production costs, milk and eggs production, other livestock products, aquaculture production, and labor dynamics. The 2022/23 AASS offers an extensive dataset essential for understanding the current state of agriculture in Tanzania. The insights gained will support the development of policies and interventions aimed at enhancing agricultural productivity, sustainability, and the livelihoods of farming communities. This data is indispensable for stakeholders addressing challenges in the agricultural sector and promoting sustainable agricultural development.

    STATISTICAL DISCLOSURE CONTROL (SDC) METHODS HAVE BEEN APPLIED TO THE MICRODATA, TO PROTECT THE CONFIDENTIALITY OF THE INDIVIDUAL DATA COLLECTED. USERS MUST BE AWARE THAT THESE ANONYMIZATION OR SDC METHODS MODIFY THE DATA, INCLUDING SUPPRESSION OF SOME DATA POINTS. THIS AFFECTS THE AGREGATED VALUES DERIVED FROM THE ANONYMIZED MICRODATA, AND MAY HAVE OTHER UNWANTED CONSEQUENCES, SUCH AS SAMPLING ERROR AND BIAS. ADDITIONAL DETAILS ABOUT THE SDC METHODS AND DATA ACESS CONDITIONS ARE PROVIDED IN THE DATA PROCESSING AND DATA ACESS CONDITIONS BELOW.

    Geographic coverage

    National, Mainland Tanzania and Zanzibar, Regions

    Analysis unit

    Households for Smallholder Farmers and Farm for Large Scale Farms

    Universe

    The survey covered agricultural households and large-scale farms.

    Agricultural households are those that meet one or more of the following two conditions: a) Have or operate at least 25 square meters of arable land, b) Own or keep at least one head of cattle or five goats/sheep/pigs or fifty chicken/ducks/turkeys during the agriculture year.

    Large-scale farms are those farms with at least 20 hectares of cultivated land, or 50 herds of cattle, or 100 goats/sheep/pigs, or 1,000 chickens. In addition to this, they should fulfill all of the following four conditions: i) The greater part of the produce should go to the market, ii) Operation of farm should be continuous, iii) There should be application of machinery / implements on the farm, and iv) There should be at least one permanent employee.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The frame used to extract the sample for the Annual Agricultural Sample Survey (AASS-2022/23) in Tanzania was derived from the 2022 Population and Housing Census (PHC-2022) Frame that lists all the Enumeration Areas (EAs/Hamlets) of the country. The AASS 2022/23 used a stratified two-stage sampling design which allows to produce reliable estimates at regional level for both Mainland Tanzania and Zanzibar.

    In the first stage, the EAs (primary sampling units) were stratified into 2-3 strata within each region and then selected by using a systematic sampling procedure with probability proportional to size (PPS), where the measure of size is the number of agricultural households in the EA. Before the selection, within each stratum and domain (region), the Enumeration Areas (EAs) were ordered according to the codes of District and Council which reflect the geographical proximity, and then ordered according to the codes of Constituency, Division, Wards, and Village. An implicit stratification was also performed, ordering by Urban/Rural type at Ward level.

    In the second stage, a simple random sampling selection was conducted . In hamlets with more than 200 households, twelve (12) agricultural households were drawn from the PHC 2022 list with a simple random sampling without replacement procedure in each sampled hamlet. In hamlets with 200 households or less, a listing exercise was carried out in each sampled hamlet, and twelve (12) agricultural households were selected with a simple random sampling without replacement procedure. A total of 1,352 PSUs were selected from the 2022 Population and Housing Census frame, of which 1,234 PSUs were from Mainland Tanzania and 118 from Zanzibar. A total number of 16,224 agricultural households were sampled (14,808 households from Mainland Tanzania and 1,416 from Zanzibar).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The 2022/23 Annual Agricultural Survey used two main questionnaires consolidated into a single questionnaire within the CAPIthe CAPI System, Smallholder Farmers and Large-Scale Farms Questionnaire. Smallholder Farmers questionnaire captured information at household level while Large Scale Farms questionnaire captured information at establishment/holding level. These questionnaires were used for data collection that covered core agricultural activities (crops, livestock, and fish farming) in both short and long rainy seasons. The 2022/23 AASS questionnaire covered 23 sections which are:

    1. COVER; The cover page included the title of the survey, survey year (2022/23), general instructions for both the interviewers and respondents. It sets the context for the survey and also it shows the survey covers the United Republic of Tanzania.

    2. SCREENING: Included preliminary questions designed to determine if the respondent or household is eligible to participate in the survey. It checks for core criteria such as involvement in agricultural activities.

    3. START INTERVIEW: The introductory section where basic details about the interview are recorded, such as the date, location, and interviewer’s information. This helped in the identification and tracking of the interview process.

    4. HOUSEHOLD MEMBERS AND HOLDER IDENTIFICATION: Collected information about all household members, including age, gender, relationship to the household head, and the identification of the main agricultural holder. This section helped in understanding the demographic composition of the agriculture household.

    5. FIELD ROSTER: Provided the details of the various agricultural fields operated by the agriculture household. Information includes the size, location, and identification of each field. This section provided a comprehensive overview of the land resources available to the household.

    6. VULI PLOT ROSTER: Focused on plots used during the Vuli season (short rainy season). It includes details on the crops planted, plot sizes, and any specific characteristics of these plots. This helps in assessing seasonal agricultural activities.

    7. VULI CROP ROSTER: Provided detailed information on the types of crops grown during the Vuli season, including quantities produced and intended use (e.g., consumption, sale, storage). This section captures the output of short rainy season farming.

    8. MASIKA PLOT ROSTER: Similar to Section 4 but focuses on the Masika season (long rainy season). It collects data on plot usage, crop types, and sizes. This helps in understanding the agricultural practices during the primary growing season.

    9. MASIKA CROP ROSTER: Provided detailed information on crops grown during the Masika season, including production quantities and uses. This section captures the output from the main agricultural season.

    10. PERMANENT CROP PRODUCTION: Focuses on perennial or permanent crops (e.g., fruit trees, tea, coffee). It includes data on the types of permanent crops, area under cultivation, production volumes, and uses. This section tracks long-term agricultural investments.

    11. CROP HARVEST USE: In this, provided the details how harvested crops are utilized within the household. Categories included consumption, sale, storage, and other uses. This section helps in understanding food security and market engagement.

    12. SEED AND SEEDLINGS ACQUISITION: Collected information on how the agriculture household acquires seeds and seedlings, including sources (e.g., purchased, saved, gifted) and types (local, improved, etc). This section provided insights into input supply chains and planting decisions based on the households, or head.

    13. INPUT USE AND ACQUISITION (FERTILIZERS AND PESTICIDES): It provided the details of the use and acquisition of agricultural inputs such as fertilizers and pesticides. It included information on quantities used, sources, and types of inputs. This section assessed the input dependency and agricultural practices.

    14. LIVESTOCK IN STOCK AND CHANGE IN STOCK: The questionnaire recorded the

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Market Research Forecast (2025). Data De-identification and Pseudonymity Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-de-identification-and-pseudonymity-software-30730

Data De-identification and Pseudonymity Software Report

Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Market Research Forecast
License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The Data De-identification and Pseudonymization Software market is experiencing robust growth, projected to reach $1941.6 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 7.3%. This expansion is driven by increasing regulatory compliance needs (like GDPR and CCPA), heightened concerns regarding data privacy and security breaches, and the burgeoning adoption of cloud-based solutions. The market is segmented by deployment (cloud-based and on-premises) and application (large enterprises and SMEs). Cloud-based solutions are gaining significant traction due to their scalability, cost-effectiveness, and ease of implementation, while large enterprises dominate the application segment due to their greater need for robust data protection strategies and larger budgets. Key market players include established tech giants like IBM and Informatica, alongside specialized providers such as Very Good Security and Anonomatic, indicating a dynamic competitive landscape with both established and emerging players vying for market share. Geographic expansion is also a key driver, with North America currently holding a significant market share, followed by Europe and Asia Pacific. The forecast period (2025-2033) anticipates continued growth fueled by advancements in artificial intelligence and machine learning for enhanced de-identification techniques, and the increasing demand for data anonymization across various sectors like healthcare, finance, and government. The restraining factors, while present, are not expected to significantly hinder the market’s overall growth trajectory. These limitations might include the complexity of implementing robust de-identification solutions, the potential for re-identification risks despite advanced techniques, and the ongoing evolution of privacy regulations necessitating continuous adaptation of software capabilities. However, ongoing innovation and technological advancements are anticipated to mitigate these challenges. The continuous development of more sophisticated algorithms and solutions addresses re-identification vulnerabilities, while proactive industry collaboration and regulatory guidance aim to streamline implementation processes, ultimately fostering continued market expansion. The increasing adoption of data anonymization across diverse sectors, coupled with the expanding global digital landscape and related data protection needs, suggests a positive outlook for sustained market growth throughout the forecast period.

Search
Clear search
Close search
Google apps
Main menu