https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global data de-identification software market size was valued at approximately USD 500 million in 2023 and is projected to reach around USD 1.5 billion by 2032, growing at a CAGR of 13.5% during the forecast period. The growth in this market is driven by the increasing need for data privacy and compliance with stringent regulatory requirements across various industries.
The primary growth factor for the data de-identification software market is the rising awareness and concern regarding data privacy and security. With the advent of big data and the proliferation of digital services, organizations are increasingly recognizing the importance of protecting personal and sensitive information. Data breaches and cyber-attacks have led to significant financial and reputational damages, prompting businesses to invest in advanced data de-identification solutions to mitigate risks. Moreover, regulatory frameworks such as GDPR in Europe, CCPA in California, and HIPAA in the United States mandate strict compliance measures for data privacy, further propelling the demand for these software solutions.
Another significant driver is the growing adoption of cloud-based services and data analytics. As organizations migrate their data to cloud platforms, the need for robust data protection mechanisms becomes paramount. De-identification software enables companies to anonymize sensitive information before storing it in the cloud, ensuring compliance with data protection regulations and reducing the risk of exposure. Additionally, the rise of data analytics for business intelligence and decision-making necessitates the use of de-identified data to maintain privacy while extracting valuable insights.
The healthcare sector is particularly noteworthy for its substantial contribution to the market growth. The industry deals with large volumes of sensitive patient information that must be protected from unauthorized access. Data de-identification software plays a crucial role in enabling healthcare providers to share and analyze patient data for research and treatment purposes without compromising privacy. The COVID-19 pandemic has further accelerated the adoption of digital health solutions, increasing the demand for data de-identification tools to ensure compliance with privacy regulations and maintain patient trust.
Data Masking Technology is becoming increasingly vital as organizations strive to protect sensitive information while maintaining data utility. This technology allows businesses to create a realistic but fictional version of their data, ensuring that sensitive information is not exposed during processes such as software testing, development, and analytics. By substituting sensitive data with anonymized values, data masking technology helps organizations comply with data protection regulations without hindering their operational efficiency. As data privacy concerns continue to rise, the adoption of data masking technology is expected to grow, offering a robust solution for safeguarding sensitive information across various sectors.
Regionally, North America holds a significant share of the data de-identification software market, driven by the presence of key market players, stringent regulatory requirements, and a high level of digitalization across industries. The Asia Pacific region is expected to witness the fastest growth during the forecast period, attributed to the rapid adoption of digital technologies, increasing awareness of data privacy, and evolving regulatory landscape in countries like China, Japan, and India. Europe also plays a vital role due to the stringent data protection regulations enforced by the GDPR, which mandates rigorous data de-identification practices.
By component, the data de-identification software market is segmented into software and services. The software segment is anticipated to dominate the market, driven by the increasing demand for advanced de-identification tools that can handle large volumes of data efficiently. Organizations are investing in sophisticated software solutions that offer automated and customizable de-identification processes to meet specific compliance requirements. These software solutions often come with features like encryption, tokenization, and data masking, enhancing their appeal to businesses across different sectors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
De-identification, anonymization, pseudoanonymization, re-identificationNational Institute of Standards and Technology (NIST) documentation declares that the use of these terms is still unclear. Words de-identification, anonymizatio_ and pseudoanonymization are sometimes interchangeable, sometimes carrying subtle different meanings. To mitigate ambiguity, NIST use definitions from ISO/TS 25237:2008:> de-identification: “general term for any process of removing the association between a set of identifying data and the data subject.” [p. 3] anonymization: “process that removes the association between the identifying dataset and the data subject.” [p. 2] pseudonymization: “particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms.”1 [p. 5]Brazilian portuguese literature largely lacks this terminology, and they are more often used in law or information technology. The utilization of these concepts in health care and research has a specific conceptualization. HIPAA (Health Insurance Portability and Accountability Act), US regulation of health data privacy protection, establishes standards for patient personal information (protected health information - PHI) handling by health care providers (covered entities).
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Data De-identification and Pseudonymization Software market is experiencing robust growth, projected to reach $1941.6 million in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 7.3%. This expansion is driven by increasing regulatory compliance needs (like GDPR and CCPA), heightened concerns regarding data privacy and security breaches, and the burgeoning adoption of cloud-based solutions. The market is segmented by deployment (cloud-based and on-premises) and application (large enterprises and SMEs). Cloud-based solutions are gaining significant traction due to their scalability, cost-effectiveness, and ease of implementation, while large enterprises dominate the application segment due to their greater need for robust data protection strategies and larger budgets. Key market players include established tech giants like IBM and Informatica, alongside specialized providers such as Very Good Security and Anonomatic, indicating a dynamic competitive landscape with both established and emerging players vying for market share. Geographic expansion is also a key driver, with North America currently holding a significant market share, followed by Europe and Asia Pacific. The forecast period (2025-2033) anticipates continued growth fueled by advancements in artificial intelligence and machine learning for enhanced de-identification techniques, and the increasing demand for data anonymization across various sectors like healthcare, finance, and government. The restraining factors, while present, are not expected to significantly hinder the market’s overall growth trajectory. These limitations might include the complexity of implementing robust de-identification solutions, the potential for re-identification risks despite advanced techniques, and the ongoing evolution of privacy regulations necessitating continuous adaptation of software capabilities. However, ongoing innovation and technological advancements are anticipated to mitigate these challenges. The continuous development of more sophisticated algorithms and solutions addresses re-identification vulnerabilities, while proactive industry collaboration and regulatory guidance aim to streamline implementation processes, ultimately fostering continued market expansion. The increasing adoption of data anonymization across diverse sectors, coupled with the expanding global digital landscape and related data protection needs, suggests a positive outlook for sustained market growth throughout the forecast period.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
As of 2023, the global Data De-Identification or Pseudonymity Software market is valued at approximately USD 1.5 billion and is projected to grow at a robust CAGR of 18% from 2024 to 2032, driven by increasing data privacy concerns and stringent regulatory requirements.
The growth of the Data De-Identification or Pseudonymity Software market is primarily fueled by the exponential increase in data generation across industries. With the advent of IoT, AI, and digital transformation strategies, the volume of data generated has seen an unprecedented spike. Organizations are now more aware of the need to protect sensitive information to comply with global data privacy regulations such as GDPR in Europe and CCPA in California. The need to ensure that personal data is anonymized or de-identified before analysis or sharing has escalated, pushing the demand for these software solutions.
Another significant growth factor is the rising number of cyber-attacks and data breaches. As data becomes more valuable, it also becomes a prime target for cybercriminals. In response, companies are investing heavily in data privacy and security measures, including de-identification and pseudonymity solutions, to mitigate risks associated with data breaches. This trend is more prevalent in sectors dealing with highly sensitive information like healthcare, finance, and government. Ensuring that data remains secure and private while being useful for analytics is a key driver for the adoption of these technologies.
Moreover, the evolution of Big Data analytics and cloud computing is also spurring growth in this market. As organizations move their operations to the cloud and leverage big data for decision-making, the importance of maintaining data privacy while utilizing large datasets for analytics cannot be overstated. Cloud-based de-identification solutions offer scalability, flexibility, and cost-effectiveness, making them increasingly popular among enterprises of all sizes. This shift towards cloud deployments is expected to further boost market growth.
Regionally, North America holds the largest market share due to its advanced technological infrastructure and stringent data protection laws. The presence of major technology companies and a high rate of adoption of advanced solutions in the U.S. and Canada contribute significantly to regional market growth. Europe follows closely, driven by rigorous GDPR compliance requirements. The Asia Pacific region is anticipated to witness the fastest growth, attributed to the increasing digitization and growing awareness about data privacy in countries like India and China.
As organizations increasingly seek to protect their sensitive data, the concept of Data Protection on Demand is gaining traction. This model allows businesses to access data protection services as and when needed, providing flexibility and scalability. By leveraging cloud-based platforms, companies can implement robust data protection measures without the need for significant upfront investments in infrastructure. This approach not only ensures compliance with data privacy regulations but also offers a cost-effective solution for managing data security. As the demand for on-demand services continues to rise, Data Protection on Demand is poised to become a critical component of data management strategies across various industries.
The Data De-Identification or Pseudonymity Software market by component is segmented into software and services. The software segment dominates the market, driven by the increasing need for automated solutions that ensure data privacy. These software solutions come with a variety of tools and features designed to anonymize or pseudonymize data efficiently, making them essential for organizations managing large volumes of sensitive information. The software market is expanding rapidly, with new innovations and improvements constantly being introduced to enhance functionality and user experience.
The services segment, though smaller compared to software, plays a crucial role in the market. Services include consulting, implementation, and maintenance, which are essential for the successful deployment and operation of de-identification software. These services help organizations tailor the software to their specific needs, ensuring compliance with regional and industry-specific data protection regulations.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global market for data de-identification and pseudonymity software is experiencing robust growth, driven by increasing regulatory compliance needs (like GDPR and CCPA), rising concerns about data privacy breaches, and the expanding adoption of cloud-based solutions. The market size in 2025 is estimated at $549.9 million. While the specific CAGR is not provided, considering the strong market drivers and the projected growth in related technologies like data anonymization and privacy-enhancing technologies, a conservative estimate of the CAGR for the forecast period (2025-2033) would be around 15%. This would place the market value at approximately $1.8 billion by 2033. The cloud-based segment is anticipated to dominate the market due to its scalability, cost-effectiveness, and ease of deployment. Enterprise applications currently hold a larger market share compared to individual applications, but the individual segment is projected to experience faster growth as individuals become more aware of data privacy and seek personalized solutions. North America and Europe are currently the leading regions, however, significant growth opportunities exist in Asia-Pacific and other emerging markets as data privacy regulations expand globally and digital transformation accelerates. The market faces some restraints, such as the high cost of implementation for some solutions and the complexity of integrating these technologies into existing IT infrastructure. However, these challenges are expected to lessen with technological advancements and increasing vendor competition. The competitive landscape is characterized by a mix of established players and emerging startups. Key vendors include TokenEx, Privacy Analytics, and others, offering a diverse range of solutions catering to various customer needs and industry verticals. Continued innovation in areas like AI-powered data masking and federated learning is expected to further shape the market, enhancing the effectiveness and efficiency of data de-identification and pseudonymity processes. The ongoing focus on robust security measures alongside anonymization capabilities will be crucial for the future growth and adoption of this vital technology.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Data De-identification and Pseudonymization Software market is experiencing robust growth, driven by increasing concerns over data privacy regulations like GDPR and CCPA, and a rising need to protect sensitive customer information. The market, estimated at $2 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated market value of $6 billion by 2033. This growth is fueled by the expanding adoption of cloud-based solutions offering scalability and cost-effectiveness, coupled with the growing prevalence of data breaches and the associated financial and reputational risks. Large enterprises are currently the dominant segment, but the increasing digitalization of SMEs is expected to drive significant growth in this segment over the forecast period. Technological advancements in anonymization techniques, particularly those using AI and machine learning, are further enhancing the market’s potential. However, the market faces challenges. High implementation costs and the complexity associated with integrating these solutions into existing IT infrastructure can act as restraints for smaller organizations. Ensuring the complete and irreversible anonymization of data remains a crucial technical hurdle, along with the ongoing evolution of privacy regulations and the need for constant adaptation of software solutions to comply. Despite these challenges, the market’s trajectory remains positive, driven by strong regulatory pressure and the imperative for businesses to protect their data assets and maintain customer trust. The diverse range of solutions offered by players like IBM, Thales Group, and smaller specialized firms indicates a maturing and competitive market landscape. The increasing demand for data-driven insights while maintaining privacy is expected to continuously drive innovation and growth within this crucial sector.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global market for data de-identification and pseudonymity software is experiencing robust growth, projected to reach $414.7 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 4.1% from 2025 to 2033. This expansion is fueled by increasing regulatory pressures like GDPR and CCPA, demanding stringent data privacy measures across various sectors. The rising adoption of cloud-based solutions and the growing need for secure data sharing among enterprises are significant drivers. Furthermore, advancements in machine learning and artificial intelligence are enhancing the accuracy and efficiency of data de-identification techniques, further fueling market growth. The market is segmented by deployment type (cloud-based and on-premises) and application (individual, enterprise, and others). The cloud-based segment is expected to dominate due to its scalability, cost-effectiveness, and ease of implementation. Enterprise applications currently hold the largest market share, driven by the need for robust data protection in large organizations handling sensitive customer information. Key players like TokenEx, Privacy Analytics, and Thales Group are actively shaping the market through continuous innovation and strategic partnerships. Geographic expansion is also a key trend, with North America and Europe currently leading the market, followed by the Asia-Pacific region witnessing significant growth potential. The continued growth trajectory is anticipated to be influenced by several factors. The increasing volume of data generated across industries will necessitate more sophisticated de-identification solutions. Moreover, the evolving threat landscape and the growing awareness of data breaches will propel demand for robust and reliable data privacy technologies. While factors such as initial investment costs and the complexity of implementing these solutions may pose some challenges, the long-term benefits of improved data security and regulatory compliance far outweigh these limitations. The market is expected to witness further consolidation with mergers and acquisitions, and the emergence of innovative solutions leveraging advanced technologies. This will ultimately lead to a more mature and comprehensive market for data de-identification and pseudonymization software.
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
The size and share of this market is categorized based on Deployment Type (On-Premises, Cloud-Based) and Application (Healthcare, Finance, Retail, Telecommunications, Government) and End-User (Small and Medium Enterprises (SMEs), Large Enterprises) and Technology (Tokenization, Data Masking, Encryption, Anonymization, Pseudonymization) and geographical regions (North America, Europe, Asia-Pacific, South America, Middle-East and Africa).
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
The size and share of this market is categorized based on Deployment Type (On-Premises, Cloud-Based) and Application (Healthcare, BFSI, Retail, Government, Telecommunications) and Organization Size (Small and Medium Enterprises (SMEs), Large Enterprises) and Functionality (Data Masking, Data Tokenization, Data Encryption, Data Anonymization, Data Pseudonymization) and geographical regions (North America, Europe, Asia-Pacific, South America, Middle-East and Africa).
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Data Obfuscation Software market is experiencing robust growth, driven by increasing concerns around data privacy regulations (like GDPR and CCPA) and the rising need to protect sensitive data during development, testing, and collaboration. The market, currently estimated at $2 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated market value of approximately $6 billion by 2033. This expansion is fueled by the adoption of cloud-based solutions offering scalability and ease of deployment, along with a growing preference for large enterprises and SMEs to leverage data masking techniques for compliance and security purposes. Key trends include the increasing integration of AI and machine learning for more sophisticated data obfuscation techniques, and the expansion into new sectors such as healthcare and finance, where sensitive data is paramount. However, factors like the complexity of implementing these solutions and the potential for reduced data usability due to excessive obfuscation act as restraints to market growth. The market is segmented by application (Large Enterprises, SMEs) and type (On-premises, Cloud-based), with the cloud-based segment expected to dominate due to its flexibility and cost-effectiveness. North America currently holds the largest market share, followed by Europe, driven by stringent data protection laws and a high concentration of technology companies. Asia Pacific is anticipated to exhibit significant growth in the forecast period due to increasing digitalization and rising data security concerns in emerging economies. The competitive landscape is characterized by a mix of established players like Oracle, IBM, and Informatica, and smaller, specialized vendors. These companies are constantly innovating to offer advanced features and enhance their solutions' ease of use. The market's future hinges on the continued evolution of data privacy regulations, advancements in data anonymization techniques, and the growing adoption of data sharing practices across different organizations. The ability of vendors to offer flexible, scalable, and user-friendly solutions will be key to their success in this rapidly expanding market.
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2
In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The data masking market, valued at $0.94 billion in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 14.71% from 2025 to 2033. This expansion is fueled by increasing concerns around data privacy regulations like GDPR and CCPA, coupled with the rising adoption of cloud computing and the expanding digital footprint of businesses across various sectors. The demand for robust data security solutions is driving significant investments in data masking technologies, enabling organizations to protect sensitive information during testing, development, and other non-production environments. Key drivers include the need to comply with stringent data privacy regulations, the increasing volume of sensitive data being generated and stored, and the growing adoption of data analytics and machine learning initiatives requiring access to masked data for training and testing purposes. The market is segmented by type (static and dynamic), deployment (cloud and on-premise), and end-user industry (BFSI, healthcare, IT and telecom, retail, government and defense, manufacturing, media and entertainment, and others). The cloud deployment segment is expected to witness significant growth due to its scalability, cost-effectiveness, and ease of access. Among end-user industries, BFSI and healthcare are projected to be major contributors to market growth due to the sensitive nature of the data they handle. The competitive landscape is dynamic, with key players including IBM, Oracle, Informatica, and others constantly innovating and expanding their offerings. Future growth will likely be influenced by advancements in artificial intelligence (AI) and machine learning (ML) for automated masking, as well as the increasing adoption of data masking solutions in emerging economies. The continued evolution of data privacy regulations worldwide will further propel market expansion in the coming years. Recent developments include: August 2022 - IBM released a new update, IBM Cloud Pak Data V4.5.x, of Advanced data masking, extended the capability of data protection and location rules by protecting the data with advanced de-identification techniques. The techniques preserve the data's format and integrity. Because of the high data utility, data users such as data scientists, business analysts, and application developers may generate high-quality insights from protected data., April 2022 - Mage signed a technology partnership agreement with Imperva to provide a data masking alternative to Imperva's Data Security Fabric (DSF) built-in capabilities for de-identifying sensitive data.. Key drivers for this market are: Increase of Organizational Data Volumes. Potential restraints include: Increase of Organizational Data Volumes. Notable trends are: The BFSI Industry to Witness a Significant Growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains anonymized DICOM images acquired as part of a cardiac T1 mapping study using a 5T MRI system. All personal identifiers have been removed in compliance with DICOM de-identification standards and institutional ethics approval. The dataset includes pre- and post-contrast MOLLI sequences from healthy volunteers and patients. It is made publicly available for academic and non-commercial research purposes.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for NER PII Extraction Dataset Dataset Summary This dataset is designed for training and evaluating Named Entity Recognition (NER) models focused on extracting Personally Identifiable Information (PII) from text. It includes a variety of entities such as names, addresses, phone numbers, email addresses, and identification numbers. The dataset is suitable for tasks that involve PII detection, compliance checks, and data anonymization. Supported Tasks and Leaderboards Named Entity… See the full description on the dataset page: https://huggingface.co/datasets/Josephgflowers/PII-NER.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Phantom of Bern: repeated scans of two volunteers with eight different combinations of MR sequence parameters
The Phantom of Bern consists of eight same-session re-scans of T1-weighted MRI with different combinations of sequence parameters, acquired on two healthy subjects. The subjects have agreed in writing to the publication of these data, including the original anonymized DICOM files and waving the requirement of defacing. Usage is permitted under the terms of the data usage agreement stated below.
The BIDS directory is organized as follows:
└── PhantomOfBern/
├─ code/
│
├─ derivatives/
│ ├─ dldirect_v1-0-0/
│ │ ├─ results/ # Folder with flattened subject/session inputs and outputs of DL+DiReCT
│ │ └─ stats2table/ # Folder with tables summarizing all DL+DiReCT outputs
│ ├─ freesurfer_v6-0-0/
│ │ ├─ results/ # Folder with flattened subject/session inputs and outputs of freesurfer
│ │ └─ stats2table/ # Folder with tables summarizing all freesurfer outputs
│ └─ siena_v2-6/
│ ├─ SIENA_results.csv # Siena's main output
│ └─ ... # Flattened subject/session inputs and outputs of SIENA
│
├─ sourcedata/
│ ├─ POBHC0001/
│ │ └─ 17473A/
│ │ └─ ... # Anonymized DICOM folders
│ └─ POBHC0002/
│ └─ 14610A/
│ └─ ... # Anonymized DICOM folders
│
├─ sub-<label>/
│ └─ ses-<label>/
│ └─ anat/ # Folder with scan's json and nifti files
├─ ...
The dataset can be cited as:
M. Rebsamen, D. Romascano, M. Capiglioni, R. Wiest, P. Radojewski, C. Rummel. The Phantom of Bern:
repeated scans of two volunteers with eight different combinations of MR sequence parameters.
OpenNeuro, 2023.
If you use these data, please also cite the original paper:
M. Rebsamen, M. Capiglioni, R. Hoepner, A. Salmen, R. Wiest, P. Radojewski, C. Rummel. Growing importance
of brain morphometry analysis in the clinical routine: The hidden impact of MR sequence parameters.
Journal of Neuroradiology, 2023.
The Phantom of Bern is distributed under the following terms, to which you agree by downloading and/or using the dataset:
To use these datasets solely for research and development or statistical purposes and not for investigation of specific subjects
To make no use of the identity of any subject discovered inadvertently, and to advise the providers of any such discovery (crummel@web.de)
When publicly presenting any results or algorithms that benefited from the use of the Phantom of Bern, you should acknowledge it, see above. Papers, book chapters, books, posters, oral presentations, and all other printed and digital presentations of results derived from the Phantom of Bern data should cite the publications listed above.
Redistribution of data (complete or in parts) in any manner without explicit inclusion of this data use agreement is prohibited.
Usage of the data for testing commercial tools is explicitly allowed. Usage for military purposes is prohibited.
The original collector and provider of the data (see acknowledgement) and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
This work was supported by the Swiss National Science Foundation under grant numbers 204593 (ScanOMetrics) and CRSII5_180365 (The Swiss-First Study).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
About
The following datasets were captured at a busy Belgian train station between 9pm and 10pm, it contains all 802.11 management frames that were captured. both datasets were captured with approximately 20 minutes between then.
Both datasets are represented by a pcap and CSV file. The CSV file contains the frame type, timestamps, signal strength, SSID and MAC addresses for every frame. In the pcap file, all generic 802.11 elements were removed for anonymization purposes.
Anonymization
All frames were anonymized by removing identifying information or renaming identifiers. Concretely, the following transformations were applied to both datasets:
In the pcap file, anonymization actions could lead to "corrupted" frames because length tags do not correspond with the actual data. However, the file and its frames are still readable in packet analyzing tools such as Wireshark or Scapy.
The script which was used to anonymize is available in the dataset.
Data
N/o | Dataset 1 | dataset 2 |
---|---|---|
Frames | 36306 | 60984 |
Beacon frames | 19693 | 27983 |
Request frames | 798 | 1580 |
Response frames | 15815 | 31421 |
Identified Wi-Fi Networks | 54 | 70 |
Identified MAC addresses | 2092 | 2705 |
Identified Wireless devices | 128 | 186 |
Capturetime | 480s | 422s |
Dataset contents
The two datasets are stored in the directories `1/` and `2/`. Each directory contains:
`anonymization.py` is the script which was used to remove identifiers.
`README.md` contains the documentation about the datasets
License
Copyright 2022-2023 Benjamin Vermunicht, Beat Signer, Maxim Van de Wynckel, Vrije Universiteit Brussel
Permission is hereby granted, free of charge, to any person obtaining a copy of this dataset and associated documentation files (the “Dataset”), to deal in the Dataset without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Dataset, and to permit persons to whom the Dataset is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions that make use of the Dataset.
THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.
The primary data consist of allele or haplotype frequencies for N=1036 anonymized U.S. population samples. Additional files are supplements to the associated publications. Any changes to spreadsheets are listed in the "Change Log" tab within each spreadsheet. DOI numbers for associated publications are listed below, under "References".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This updated labeled dataset builds upon the initial systematic review by van de Schoot et al. (2018; DOI: 10.1080/00273171.2017.1412293), which included studies on post-traumatic stress symptom (PTSS) trajectories up to 2016, sourced from the Open Science Framework (OSF). As part of the FORAS project - Framework for PTSS trajectORies: Analysis and Synthesis (funded by the Dutch Research Council, grant no. 406.22.GO.048 and pre-registered at PROSPERO under ID CRD42023494027), we extended this dataset to include publications between 2016 and 2023. In total, the search identified 10,594 de-duplicated records obtained via different search methods, each published with their own search query and result: Exact replication of the initial search: OSF.IO/QABW3 Comprehensive database search: OSF.IO/D3UV5 Snowballing: OSF.IO/M32TS Full-text search via Dimensions data: OSF.IO/7EXC5 Semantic search via OpenAlex: OSF.IO/M32TS Humans (BC, RN) and AI (Bron et al., 2024) have screened the records, and disagreements have been solved (MvZ, BG, RvdS). Each record was screened separately for Title, Abstract, and Full-text inclusion and per inclusion criteria. A detailed screening logbook is available at OSF.IO/B9GD3, and the entire process is described in https://doi.org/10.31234/osf.io/p4xm5. A description of all columns/variables and full methodological details is available in the accompanying codebook. Important Notes: Duplicates: To maintain consistency and transparency, duplicates are left in the dataset and are labeled with the same classification as the original records. A filter is provided to allow users to exclude these duplicates as needed. Anonymized Data: The dataset "...._anonymous" excludes DOIs, OpenAlex IDs, titles, and abstracts to ensure data anonymization during the review process. The complete dataset, including all identifiers, is uploaded under embargo and will be publicly available on 01-10-2025. This dataset serves not only as a valuable resource for researchers interested in systematic reviews of PTSS trajectories and facilitates reproducibility and transparency in the research process but also for data scientists who would like to mimic the screening process using different machine learning and AI models.
Abstract: This dataset was created as part of the TAPIR project to identify internal ORCID coverage at Osnabrück University and to investigate external ORCID Coverage/Intersection in selected external open data sources (FREYA, ORCID, OpenAlex). Only researchers employed at Osnabrück University in June 2021 and with status full professor or project manager in third-party funded projects are considered (264 persons in total). The dataset contains additional information about researchers’ field and research topic retrieved by local Current Information System (CRIS). Personal data was anonymized for further processing via an internally resolvable identifier (primary key). The present dataset is the result of mapping and anonymized aggregation of the internally compiled list and externally queried lists from FREYA, ORCID, and OpenAlex generated via the query tool "pidgraph-notebooks" (https://doi.org/10.5281/zenodo.6373245). The context of the data collection and analysis is described in more detail in the related publication.
Digital clinical decision support algorithms (CDSAs) that guide healthcare workers during consultations can enhance adherence to guidelines and the resulting quality of care. However, this improvement depends on the accuracy of inputs (symptoms and signs) entered by healthcare workers into the digital tool, which relies mainly on their clinical skills, that are often limited, especially in resource-constrained primary care settings. This study aimed to identify and characterize potential clinical skill gaps based on CDSA data patterns and clinical observations. We retrospectively analyzed data from 20,085 pediatric consultations conducted using an IMCI-based CDSA in 16 primary health centers in Rwanda. We focused on clinical signs with numerical values: temperature, mid-upper arm circumference (MUAC), weight, height, z-scores (MUAC for age, weight for age, and weight for height), heart rate, respiratory rate and blood oxygen saturation. Statistical summary measures (frequency of skipped measurements, frequent plausible and implausible values) and their variation in individual health centers compared to the overall average were used to identify 10 health centers with irregular data patterns signaling potential clinical skill gaps. We subsequently observed 188 consultations in these health centers and interviewed healthcare workers to understand potential error causes. Observations indicated basic measurements not being assessed correctly in most children; weight (70%), MUAC (69%), temperature (67%), height (54%). These measures were predominantly conducted by minimally trained non-clinical staff in the registration area. More complex measures, done mostly by healthcare workers in the consultation room, were often skipped: respiratory rate (43%), heart rate (37%), blood oxygen saturation (33%). This was linked to underestimating the importance of these signs in child management, especially in the context of high patient loads typical at primary care level. Addressing clinical skill gaps through in-person training, eLearning and regular personalized mentoring tailored to specific health center needs is imperative to improve quality of care and enhance the benefits of CDSAs.
16 primary healthcare centers (HCs) of Rusizi and Nyamasheke districts in Rwanda.
First dataset was collected directly by the ePOCT+ CDSA during 20,085 pediatric consultations across 16 primary health centers in Rwanda. It includes anonymized patient, healthfacility and consultation data with key clinical measurements (temperature, mid-upper arm circumference (MUAC), weight, height, MUAC for age z-score, weight for age z-score, weight for height z-score, heart rate, respiratory rate and blood oxygen saturation (SpO2).) Second dataset results from structured observations of 188 routine pediatric consultations at a subset of 10 health facilities. Clinicians used a standardized evaluation form to record clinical measurements, mirroring variables in the first dataset. This dataset is used to deepen the analysis from the primary dataset by understanding the reason for the patterns appearing from the quantitative analysis of the first dataset.
Children aged 1 day to 14 years with an acute condition, in the 16 HCs where the intervention was deployed.
Clinical data [cli]
First dataset: ePOCT+ stores all the information (date of consultation, anthropometric measures, vitals, presence/absence of specific symptoms and signs prompted by the algorithm, diagnoses, medicines, managements, etc.) entered by the HW in the tablet during consultations. We retrospectively analyzed data from 20,085 outpatient consultations conducted between November 2021 and October 2022 with children aged 1 day to 14 years with an acute condition, in the 16 HCs where the intervention was deployed. Data cleaning, management, and analyses were conducted using R software (version 4.2.1). Second dataset: Based on the results of the retrospective analysis, we observed 188 routine consultations in a subset of 10 of 16 HCs (approximately 19 observations per HC), from 20 December 2022 and to 09 March 2023. The selection of HCs was guided by the retrospective analysis, ensuring that the 10 HCs chosen were those showing the most critical results. The observing study clinician obtained oral consent from the HWs and was instructed not to interfere with the consultation to avoid introducing any additional bias to the observer effect. To ensure a standardized and consistent evaluation, a digital evaluation form (Google sheets) was used. These observations were conducted over 3 days per HC, with efforts made to separate them by a few days in order to have more chance to observe several different HWs and minimize potential bias. At the end of each day of observation in a HC (and not after each consultation to avoid any influence on subsequent consultations), the observing study clinician conducted an interview with the HW to understand why the assessment of some signs was skipped.Data were exported to Microsoft Excel (Version 16.77.1) for further simple descriptive analysis.
Second dataset: Most of the time, there was only one HW attending to children in the HC on a given day. On the rare occasions when two HW were present, each was observed by one of the two study clinicians.
Other [oth]
The second dataset for this study was derived from structured observations of 188 routine pediatric consultations conducted across a subset of 10 health facilities. Clinicians utilized a standardized evaluation form that included variables aligning with those in the first dataset. This secondary dataset was designed to provide deeper insights into patterns observed in the primary dataset through the quantitative analysis.
The data collection focused on various clinical measurements and observations, categorized as follows:
General Information:
• Date of the consultation.
• Health facility (coded for anonymity).
• Clinical measurements taken at the reception and during the consultation.
• Presence of a conducting line. Additional remarks related to the consultation.
Clinical Measurements: For each of the following, the dataset records whether the measurement was assessed or skipped, the quality of assessment (sufficient/insufficient), reasons for skipping or insufficient assessments, and any extra remarks:
• Temperature (T°).
• MUAC (Mid-Upper Arm Circumference).
• Weight. Height.
• Respiratory Rate (RR).
• Blood Oxygen Saturation (Sat).
• Heart Rate (HR).
Additional Observations: Remarks on other signs and symptoms assessed during the consultation. The structured nature of this dataset ensures consistency in evaluating the reasons behind clinical decisions and the quality of care provided in routine pediatric consultations.
Data editing was conducted as follows: First data set: • Data Extraction: The dataset was extracted from the larger ePOCT+ storage system, which records all consultation-related information entered by healthcare workers (HWs) in tablets during consultations. This includes details such as the date of consultation, anthropometric measures, vital signs, the presence or absence of specific symptoms and signs prompted by the algorithm, diagnoses, medicines, and managements.
• Data Cleaning:
The extracted data were systematically cleaned to focus solely on the variables of interest for this analysis. Irrelevant variables and incomplete records were excluded to ensure a streamlined and accurate dataset.
• Anonymization:
To protect patient and health facilities confidentiality, the data were anonymized prior to analysis. All personal identifiers were removed, and only aggregated or coded information was retained.
• Analysis Preparation:
After cleaning and anonymization, the dataset was reviewed for consistency and coherence. Specific patterns of data were analyzed for the selected variables of interest, ensuring alignment with the study objectives.
• Software Used: Data cleaning, management, and analyses were conducted using R software (version 4.2.1). All processes, including extraction, cleaning, and anonymization, were documented to maintain transparency and reproducibility.
**Second dataset:**
• Data Collection: Data were collected directly from respondents through a Google Forms questionnaire. The structured format ensured standardized responses across all participants, facilitating subsequent data processing and analysis.
• Data Export:
Upon completion of data collection, the dataset was exported from Google Forms to Microsoft Excel (Version 16.77.1). This provided a structured and organized format for further data handling.
• Anonymization:
All personally identifiable information was removed during the data processing phase to protect participant confidentiality. Anonymization measures included replacing personal identifiers with unique codes and omitting any information that could reveal the identity of respondents.
• Data Cleaning and Descriptive Analysis:
The dataset was reviewed in Microsoft Excel to ensure consistency and completeness. Responses were screened for missing or inconsistent data, and necessary corrections were made where appropriate. Simple descriptive analyses were conducted within Excel to summarize key variables and identify initial patterns in the data.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global data de-identification software market size was valued at approximately USD 500 million in 2023 and is projected to reach around USD 1.5 billion by 2032, growing at a CAGR of 13.5% during the forecast period. The growth in this market is driven by the increasing need for data privacy and compliance with stringent regulatory requirements across various industries.
The primary growth factor for the data de-identification software market is the rising awareness and concern regarding data privacy and security. With the advent of big data and the proliferation of digital services, organizations are increasingly recognizing the importance of protecting personal and sensitive information. Data breaches and cyber-attacks have led to significant financial and reputational damages, prompting businesses to invest in advanced data de-identification solutions to mitigate risks. Moreover, regulatory frameworks such as GDPR in Europe, CCPA in California, and HIPAA in the United States mandate strict compliance measures for data privacy, further propelling the demand for these software solutions.
Another significant driver is the growing adoption of cloud-based services and data analytics. As organizations migrate their data to cloud platforms, the need for robust data protection mechanisms becomes paramount. De-identification software enables companies to anonymize sensitive information before storing it in the cloud, ensuring compliance with data protection regulations and reducing the risk of exposure. Additionally, the rise of data analytics for business intelligence and decision-making necessitates the use of de-identified data to maintain privacy while extracting valuable insights.
The healthcare sector is particularly noteworthy for its substantial contribution to the market growth. The industry deals with large volumes of sensitive patient information that must be protected from unauthorized access. Data de-identification software plays a crucial role in enabling healthcare providers to share and analyze patient data for research and treatment purposes without compromising privacy. The COVID-19 pandemic has further accelerated the adoption of digital health solutions, increasing the demand for data de-identification tools to ensure compliance with privacy regulations and maintain patient trust.
Data Masking Technology is becoming increasingly vital as organizations strive to protect sensitive information while maintaining data utility. This technology allows businesses to create a realistic but fictional version of their data, ensuring that sensitive information is not exposed during processes such as software testing, development, and analytics. By substituting sensitive data with anonymized values, data masking technology helps organizations comply with data protection regulations without hindering their operational efficiency. As data privacy concerns continue to rise, the adoption of data masking technology is expected to grow, offering a robust solution for safeguarding sensitive information across various sectors.
Regionally, North America holds a significant share of the data de-identification software market, driven by the presence of key market players, stringent regulatory requirements, and a high level of digitalization across industries. The Asia Pacific region is expected to witness the fastest growth during the forecast period, attributed to the rapid adoption of digital technologies, increasing awareness of data privacy, and evolving regulatory landscape in countries like China, Japan, and India. Europe also plays a vital role due to the stringent data protection regulations enforced by the GDPR, which mandates rigorous data de-identification practices.
By component, the data de-identification software market is segmented into software and services. The software segment is anticipated to dominate the market, driven by the increasing demand for advanced de-identification tools that can handle large volumes of data efficiently. Organizations are investing in sophisticated software solutions that offer automated and customizable de-identification processes to meet specific compliance requirements. These software solutions often come with features like encryption, tokenization, and data masking, enhancing their appeal to businesses across different sectors.