26 datasets found
  1. f

    The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be...

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin (2023). The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be removed or generalized for a data set to be considered de-identified (see 45 CFR 164.514(b)(2)(i)). [Dataset]. http://doi.org/10.1371/journal.pone.0028071.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be removed or generalized for a data set to be considered de-identified (see 45 CFR 164.514(b)(2)(i)).

  2. CHHS- Data De-Identification Guidelines

    • data.countyofnapa.org
    application/rdfxml +5
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Health and Human Services Agency (2023). CHHS- Data De-Identification Guidelines [Dataset]. https://data.countyofnapa.org/Internal-Assets/CHHS-Data-De-Identification-Guidelines/8zfi-957w
    Explore at:
    application/rdfxml, application/rssxml, csv, json, xml, tsvAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset authored and provided by
    California Health and Human Services Agencyhttps://www.chhs.ca.gov/
    Description

    The California Health and Human Services Agency (CHHS) Data De-identification Guidelines (DDG) describes a procedure to be used by departments and offices in the CHHS to assess data for public release. As part of the document, specific actions that may be taken for each step in the procedure are described. These steps are intended to assist departments in assuring that data is de-identified for purposes of public release that meet the requirements of the California Information Practices Act1 (IPA) and the Health Insurance Portability and Accountability Act2 (HIPAA) to prevent the disclosure of personal information.

  3. Data De-identification Software Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data De-identification Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-software-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data De-identification Software Market Outlook



    The global data de-identification software market size was valued at approximately USD 500 million in 2023 and is projected to reach around USD 1.5 billion by 2032, growing at a CAGR of 13.5% during the forecast period. The growth in this market is driven by the increasing need for data privacy and compliance with stringent regulatory requirements across various industries.



    The primary growth factor for the data de-identification software market is the rising awareness and concern regarding data privacy and security. With the advent of big data and the proliferation of digital services, organizations are increasingly recognizing the importance of protecting personal and sensitive information. Data breaches and cyber-attacks have led to significant financial and reputational damages, prompting businesses to invest in advanced data de-identification solutions to mitigate risks. Moreover, regulatory frameworks such as GDPR in Europe, CCPA in California, and HIPAA in the United States mandate strict compliance measures for data privacy, further propelling the demand for these software solutions.



    Another significant driver is the growing adoption of cloud-based services and data analytics. As organizations migrate their data to cloud platforms, the need for robust data protection mechanisms becomes paramount. De-identification software enables companies to anonymize sensitive information before storing it in the cloud, ensuring compliance with data protection regulations and reducing the risk of exposure. Additionally, the rise of data analytics for business intelligence and decision-making necessitates the use of de-identified data to maintain privacy while extracting valuable insights.



    The healthcare sector is particularly noteworthy for its substantial contribution to the market growth. The industry deals with large volumes of sensitive patient information that must be protected from unauthorized access. Data de-identification software plays a crucial role in enabling healthcare providers to share and analyze patient data for research and treatment purposes without compromising privacy. The COVID-19 pandemic has further accelerated the adoption of digital health solutions, increasing the demand for data de-identification tools to ensure compliance with privacy regulations and maintain patient trust.



    Data Masking Technology is becoming increasingly vital as organizations strive to protect sensitive information while maintaining data utility. This technology allows businesses to create a realistic but fictional version of their data, ensuring that sensitive information is not exposed during processes such as software testing, development, and analytics. By substituting sensitive data with anonymized values, data masking technology helps organizations comply with data protection regulations without hindering their operational efficiency. As data privacy concerns continue to rise, the adoption of data masking technology is expected to grow, offering a robust solution for safeguarding sensitive information across various sectors.



    Regionally, North America holds a significant share of the data de-identification software market, driven by the presence of key market players, stringent regulatory requirements, and a high level of digitalization across industries. The Asia Pacific region is expected to witness the fastest growth during the forecast period, attributed to the rapid adoption of digital technologies, increasing awareness of data privacy, and evolving regulatory landscape in countries like China, Japan, and India. Europe also plays a vital role due to the stringent data protection regulations enforced by the GDPR, which mandates rigorous data de-identification practices.



    Component Analysis



    By component, the data de-identification software market is segmented into software and services. The software segment is anticipated to dominate the market, driven by the increasing demand for advanced de-identification tools that can handle large volumes of data efficiently. Organizations are investing in sophisticated software solutions that offer automated and customizable de-identification processes to meet specific compliance requirements. These software solutions often come with features like encryption, tokenization, and data masking, enhancing their appeal to businesses across different sectors.



    <a href="https://dataintelo.com/report/data-masking-

  4. D

    Data De-identification and Pseudonymity Software Market Report | Global...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data De-identification and Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-and-pseudonymity-software-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data De-identification and Pseudonymity Software Market Outlook



    The global data de-identification and pseudonymity software market is projected to grow significantly, reaching approximately USD 4.2 billion by 2032, driven primarily by increasing data privacy concerns and stringent regulatory requirements worldwide.



    The primary growth factor in the data de-identification and pseudonymity software market is the surge in data breaches and cyber-attacks. With the exponential increase in data generation, organizations are more vulnerable to data breaches and unauthorized access. These security concerns have prompted businesses and governments to invest heavily in robust data protection solutions. Data de-identification and pseudonymity software provide a secure way to anonymize sensitive information, making it less susceptible to malicious activities. As data protection laws become more rigorous, the demand for such technologies will continue to rise, further propelling market growth.



    Another significant factor contributing to market growth is the growing awareness and emphasis on data privacy among consumers. In recent years, consumers have become increasingly aware of how their data is being used and the potential risks associated with data misuse. This heightened awareness has put pressure on organizations to adopt comprehensive data protection measures. Data de-identification and pseudonymity software offer a means to protect personal information while still allowing organizations to utilize data for analytics and decision-making. This dual benefit is a key driver for the adoption of these technologies across various sectors.



    Moreover, regulatory compliance is a crucial driver for the market. Regulations such as the General Data Protection Regulation (GDPR) in Europe, the Health Insurance Portability and Accountability Act (HIPAA) in the United States, and various other data protection laws worldwide mandate stringent measures for data protection. Non-compliance can result in hefty fines and legal repercussions. Therefore, organizations are increasingly adopting data de-identification and pseudonymity software to ensure compliance with these regulations. The need for regulatory compliance is expected to sustain market growth in the foreseeable future.



    Regionally, North America currently dominates the global data de-identification and pseudonymity software market, accounting for the largest market share. This is attributed to the presence of major technology players, stringent data protection regulations, and high adoption rates of advanced technologies in the region. Europe follows closely, with significant market contributions from countries such as Germany, France, and the UK, driven by robust regulatory frameworks like GDPR. The Asia Pacific region is also expected to witness substantial growth, fueled by rapid digitalization, increasing cybersecurity threats, and growing awareness about data privacy in countries like China, India, and Japan.



    Data Masking Tools play a pivotal role in enhancing the security framework of organizations by providing an additional layer of protection for sensitive information. These tools are designed to obscure specific data within a dataset, ensuring that unauthorized users cannot access or decipher the original information. As businesses increasingly rely on data-driven insights, the need for robust data masking solutions becomes more critical. By employing data masking tools, organizations can safely share data across departments or with third-party vendors without compromising privacy. This capability is especially beneficial in industries such as healthcare and finance, where data privacy is paramount. The integration of data masking tools with existing data protection strategies can significantly reduce the risk of data breaches and ensure compliance with regulatory standards.



    Component Analysis



    The data de-identification and pseudonymity software market can be segmented by component into software and services. The software segment is anticipated to hold the lion's share due to the increasing adoption of data protection solutions across various industries. Software solutions provide automated tools for anonymizing and pseudonymizing data, ensuring compliance with regulatory standards. These solutions are essential for organizations aiming to mitigate the risks associated with data breaches and unauthorized access. As cyber threats continue to evolve, the demand for advanced software solutions is exp

  5. S

    white plains test

    • data.ny.gov
    application/rdfxml +5
    Updated Dec 12, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Health (2013). white plains test [Dataset]. https://data.ny.gov/Health/white-plains-test/yjfh-t3x7/about
    Explore at:
    tsv, csv, xml, json, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Dec 12, 2013
    Authors
    New York State Department of Health
    Area covered
    White Plains
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. A downloadable file with this data is available for ease of download at: https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/3m9u-ws8e. For more information check out: http://www.health.ny.gov/statistics/sparcs/ or go to the “About” tab.

  6. c

    Data from: A DICOM dataset for evaluation of medical image de-identification...

    • cancerimagingarchive.net
    • dev.cancerimagingarchive.net
    csv, dicom, n/a
    Updated Jan 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). A DICOM dataset for evaluation of medical image de-identification [Dataset]. http://doi.org/10.7937/s17z-r072
    Explore at:
    dicom, csv, n/aAvailable download formats
    Dataset updated
    Jan 31, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Apr 7, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Open access or shared research data must comply with (HIPAA) patient privacy regulations. These regulations require the de-identification of datasets before they can be placed in the public domain. The process of image de-identification is time consuming, requires significant human resources, and is prone to human error. Automated image de-identification algorithms have been developed but the research community requires some method of evaluation before such tools can be widely accepted. This evaluation requires a robust dataset that can be used as part of an evaluation process for de-identification algorithms.

    We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM image information objects were selected from datasets published in TCIA. Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM data elements to mimic typical clinical imaging exams. The evaluation dataset was de-identified by a TCIA curation team using standard TCIA tools and procedures. We are publishing the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (result of TCIA curation) in advance of a potential competition, sponsored by the National Cancer Institute (NCI), for de-identification algorithm evaluation, and de-identification of medical image datasets. The evaluation dataset published here is a subset of a larger evaluation dataset that was created under contract for the National Cancer Institute. This subset is being published to allow researchers to test their de-identification algorithms and promote standardized procedures for validating automated de-identification.

  7. Hospital Inpatient Discharges (SPARCS De-Identified): 2018

    • healthdata.gov
    • health.data.ny.gov
    application/rdfxml +5
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    health.data.ny.gov (2025). Hospital Inpatient Discharges (SPARCS De-Identified): 2018 [Dataset]. https://healthdata.gov/State/Hospital-Inpatient-Discharges-SPARCS-De-Identified/pw9x-uv3q
    Explore at:
    csv, json, tsv, xml, application/rssxml, application/rdfxmlAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    health.data.ny.gov
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. Note: This dataset may be downloaded from the attachments section of this page in a smaller, compressed format.

  8. v

    Global Data De-identification Software Market Size By Component, By...

    • verifiedmarketresearch.com
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Data De-identification Software Market Size By Component, By Application, By Deployment Mode, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/data-de-identification-software-market/
    Explore at:
    Dataset updated
    Aug 29, 2024
    Dataset authored and provided by
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Data De-identification Software Market size was valued at USD 407 Million in 2023 and is projected to reach USD 533.4 Million by 2031, growing at a CAGR of 4.8% during the forecasted period 2024 to 2031.

    Global Data De-identification Software Market Drivers

    The market drivers for the Data De-identification Software Market can be influenced by various factors. These may include:

    Regulatory Compliance: Organizations are required to protect personally identifiable information (PII) by stringent rules and data protection legislation, such as the GDPR in Europe, the CCPA in California, and HIPAA in the United States. The need for de-identification solutions to guarantee compliance and avert fines is driven by this regulatory climate.

    Growing Concerns About Data Privacy: The increasing consciousness among consumers and enterprises regarding data privacy and security is compelling them to implement de-identification technology. People are calling for stronger protections because they are more concerned about how their data is utilized.

    Global Data De-identification Software Market Restraints

    Several factors can act as restraints or challenges for the Data De-identification Software Market. These may include:

    Regulatory Complexity and Compliance Costs: It can be difficult to navigate the complicated web of data protection laws, which includes the CCPA in California and the GDPR in Europe. Adoption of data de-identification technologies may be hampered by companies' high compliance expenses and challenges in staying current with legislation.

    Integration Difficulties: It might be difficult to integrate de-identification software with current procedures and systems. Organizations may experience compatibility problems and technical challenges, which would extend the time and expense of implementation.

  9. Medical Imaging De-Identification Software Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Medical Imaging De-Identification Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/medical-imaging-de-identification-software-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Medical Imaging De-Identification Software Market Outlook




    According to our latest research, the global medical imaging de-identification software market size reached USD 315 million in 2024, driven by the increasing adoption of digital healthcare solutions and stringent regulatory requirements for patient data privacy. The market is expected to grow at a robust CAGR of 13.2% during the forecast period, reaching approximately USD 858 million by 2033. The primary growth factor fueling this expansion is the rising volume of medical imaging data and the escalating need to ensure compliance with data protection laws such as HIPAA, GDPR, and other regional regulations.




    The growth trajectory of the medical imaging de-identification software market is underpinned by the exponential increase in digital imaging procedures across healthcare facilities worldwide. As advanced imaging modalities like MRI, CT, and PET scans become standard in diagnostic workflows, the volume of data generated has surged. This data often contains sensitive patient information, making it imperative for healthcare organizations to adopt robust de-identification solutions. The proliferation of health information exchanges and the increasing emphasis on interoperability have further heightened the need for secure and compliant data sharing. These factors collectively foster a conducive environment for the adoption of de-identification software, as organizations seek to balance data utility with stringent privacy requirements.




    Another major driver is the evolving regulatory landscape that mandates strict adherence to patient confidentiality and data protection standards. Regulatory frameworks such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, the General Data Protection Regulation (GDPR) in Europe, and similar regulations in Asia Pacific and other regions are compelling healthcare providers and research institutions to implement advanced de-identification solutions. These regulations impose hefty penalties for non-compliance, further incentivizing investments in software that can automate and streamline the de-identification process. Moreover, the growing trend of collaborative research and data sharing among healthcare entities necessitates reliable de-identification tools to facilitate secure and lawful data exchange.




    Technological advancements in artificial intelligence and machine learning are also playing a pivotal role in shaping the medical imaging de-identification software market. Modern solutions leverage AI-driven algorithms to enhance the accuracy and efficiency of de-identification processes, reducing the risk of inadvertent data leaks. These innovations are particularly valuable in large-scale research projects, where massive datasets must be anonymized rapidly and without compromising data integrity. Furthermore, the integration of de-identification software with existing healthcare IT infrastructure, such as PACS and EHR systems, is becoming increasingly seamless, making adoption easier for end-users. This technological evolution is expected to drive further market growth over the next decade.




    From a regional perspective, North America currently dominates the medical imaging de-identification software market, accounting for the largest share in 2024. The region’s leadership is attributed to the presence of advanced healthcare infrastructure, high adoption rates of digital health technologies, and stringent regulatory frameworks. Europe follows closely, propelled by GDPR compliance and increasing investments in healthcare IT. The Asia Pacific region is experiencing the fastest growth, fueled by expanding healthcare access, rapid digitalization, and rising awareness of data privacy. Latin America and the Middle East & Africa are also witnessing gradual adoption, supported by ongoing healthcare modernization initiatives and regulatory developments.





    Component Analysis




    The component segment of the medical imaging de-i

  10. Hospital Inpatient Discharges (SPARCS De-Identified): 2023

    • healthdata.gov
    application/rdfxml +5
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    health.data.ny.gov (2025). Hospital Inpatient Discharges (SPARCS De-Identified): 2023 [Dataset]. https://healthdata.gov/d/rwh3-2k63
    Explore at:
    application/rdfxml, csv, application/rssxml, tsv, xml, jsonAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    health.data.ny.gov
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges.

    This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

    For more information visit: https://www.health.ny.gov/statistics/sparcs/

  11. Hospital Inpatient Discharges (SPARCS De-Identified): 2013

    • healthdata.gov
    • health.data.ny.gov
    application/rdfxml +5
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    health.data.ny.gov (2025). Hospital Inpatient Discharges (SPARCS De-Identified): 2013 [Dataset]. https://healthdata.gov/State/Hospital-Inpatient-Discharges-SPARCS-De-Identified/gbzd-5nff
    Explore at:
    application/rdfxml, csv, json, application/rssxml, xml, tsvAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    health.data.ny.gov
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

  12. c

    Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation,...

    • cancerimagingarchive.net
    csv, dicom, n/a +1
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2025). Data in Support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation, MIDI-B-Curated-Validation, MIDI-B-Synthetic-Test, MIDI-B-Curated-Test) [Dataset]. http://doi.org/10.7937/cf2p-aw56
    Explore at:
    sqlite and zip, dicom, csv, n/aAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 2, 2025
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Abstract

    These resources comprise a large and diverse collection of multi-site, multi-modality, and multi-cancer clinical DICOM images from 538 subjects infused with synthetic PHI/PII in areas encountered by TCIA curation teams. Also provided is a TCIA-curated version of the synthetic dataset, along with mapping files for mapping identifiers between the two.

    This new MIDI data resource includes DICOM datasets used in the Medical Image De-Identification Benchmark (MIDI-B) challenge at MICCAI 2024. They are accompanied by ground truth answer keys and a validation script for evaluating the effectiveness of medical image de-identification workflows. The validation script systematically assesses de-identified data against an answer key outlining appropriate actions and values for proper de-identification of medical images, promoting safer and more consistent medical image sharing.

    Introduction

    Medical imaging research increasingly relies on large-scale data sharing. However, reliable de-identification of DICOM images still presents significant challenges due to the wide variety of DICOM header elements and pixel data where identifiable information may be embedded. To address this, we have developed an openly accessible synthetic dataset containing artificially generated protected health information (PHI) and personally identifiable information (PII).

    These resources complement our earlier work (Pseudo-PHI-DICOM-data ) hosted on The Cancer Imaging Archive. As an example of its use, we also provide a version curated by The Cancer Imaging Archive (TCIA) curation team. This resource builds upon best practices emphasized by the MIDI Task Group who underscore the importance of transparency, documentation, and reproducibility in de-identification workflows, part of the themes at recent conferences (Synapse:syn53065760) and workshops (2024 MIDI-B Challenge Workshop).

    This framework enables objective benchmarking of de-identification performance, promotes transparency in compliance with regulatory standards, and supports the establishment of consistent best practices for sharing clinical imaging data. We encourage the research community to use these resources to enhance and standardize their medical image de-identification workflows.

    Methods

    Subject Inclusion and Exclusion Criteria

    The source data were selected from imaging already hosted in de-identified form on TCIA. Imaging containing faces were excluded, and no new human studies were performed for his project.

    Data Acquisition

    To build the synthetic dataset, image series were selected from TCIA’s curated datasets to represent a broad range of imaging modalities (CR, CT, DX, MG, MR, PT, SR, US) , manufacturers including (GE, Siemens, Varian , Confirma, Agfa, Eigen, Elekta, Hologic, KONICA MINOLTA, others) , scan parameters, and regions of the body. These were processed to inject the synthetic PHI/PII as described.

    Data Analysis

    Synthetic pools of PHI, like subject and scanning institution information, were generated using the Python package Faker (https://pypi.org/project/Faker/8.10.3/). These were inserted into DICOM metadata of selected imaging files using a system of inheritable rule-based templates outlining re-identification functions for data insertion and logging for answer key creation. Text was also burned-in to the pixel data of a number of images. By systematically embedding realistic synthetic PHI into image headers and pixel data, accompanied by a detailed ground-truth answer key, our framework enables users transparency, documentation, and reproducibility in de-identification practices, aligned with the HIPAA Safe Harbor method, DICOM PS3.15 Confidentiality Profiles, and TCIA best practices.

    Usage Notes

    This DICOM collection is split into two datasets, synthetic and curated. The synthetic dataset is the PHI/PII infused DICOM collection accompanied by a validation script and answer keys for testing, refining and benchmarking medical image de-identification pipelines. The curated dataset is a version of the synthetic dataset curated and de-identified by members of The Cancer Imaging Archive curation team. It can be used as a guide, an example of medical image curation best practices. For the purposes of the De-Identification challenge at MICCAI 2024, the synthetic and curated datasets each contain two subsets, a portion for Validation and the other for Testing.

    To link a curated dataset to the original synthetic dataset and answer keys, a mapping between the unique identifiers (UIDs) and patient IDs must be provided in CSV format to the evaluation software. We include the mapping files associated with the TCIA-curated set as an example. Lastly, for both the Validation and Testing datasets, an answer key in sqlite.db format is provided. These components are for use with the Python validation script linked below (4). Combining these components, a user developing or evaluating de-identification methods can ensure they meet a specification for successfully de-identifying medical image data.

  13. w

    Hospital Inpatient Discharges (SPARCS De-Identified): 2015

    • data.wu.ac.at
    • healthdata.gov
    • +1more
    application/excel +5
    Updated Jun 7, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Data NY - DOH (2018). Hospital Inpatient Discharges (SPARCS De-Identified): 2015 [Dataset]. https://data.wu.ac.at/schema/health_data_ny_gov/ODJ4bS15Nmc4
    Explore at:
    json, csv, application/xml+rdf, xml, application/excel, xlsxAvailable download formats
    Dataset updated
    Jun 7, 2018
    Dataset provided by
    Open Data NY - DOH
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

  14. S

    Hospital Inpatient Discharges (SPARCS De-Identified): 2016

    • health.data.ny.gov
    • healthdata.gov
    • +1more
    application/rdfxml +5
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Health (2019). Hospital Inpatient Discharges (SPARCS De-Identified): 2016 [Dataset]. https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/gnzp-ekau
    Explore at:
    csv, application/rdfxml, json, tsv, xml, application/rssxmlAvailable download formats
    Dataset updated
    Sep 10, 2019
    Dataset authored and provided by
    New York State Department of Health
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

  15. Hospital Inpatient Discharges (SPARCS De-Identified): 2009

    • healthdata.gov
    • health.data.ny.gov
    • +1more
    application/rdfxml +5
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    health.data.ny.gov (2025). Hospital Inpatient Discharges (SPARCS De-Identified): 2009 [Dataset]. https://healthdata.gov/State/Hospital-Inpatient-Discharges-SPARCS-De-Identified/jis2-bx5r
    Explore at:
    application/rdfxml, application/rssxml, tsv, xml, json, csvAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    health.data.ny.gov
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified dataset contains discharge level detail on patient characteristics, diagnoses, treatments, services, charges and costs. This data contains basic record level detail regarding the discharge; however the data does not contain protected health information (PHI) under Health Insurance Portability and Accountability Act (HIPAA). The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

  16. f

    A performance comparison of the de-identification game solving approaches.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Ellen Wright Clayton; Murat Kantarcioglu; Ranjit Ganta; Raymond Heatherly; Bradley A. Malin (2023). A performance comparison of the de-identification game solving approaches. [Dataset]. http://doi.org/10.1371/journal.pone.0120592.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Ellen Wright Clayton; Murat Kantarcioglu; Ranjit Ganta; Raymond Heatherly; Bradley A. Malin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BIS: Backward Induction Search. LBS: Lattice-Based Search. Payoff difference means the absolute difference of payoff for one record between a heuristic-driven approach and the baseline BIS approach.A performance comparison of the de-identification game solving approaches.

  17. S

    under 17

    • health.data.ny.gov
    application/rdfxml +5
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Health (2019). under 17 [Dataset]. https://health.data.ny.gov/Health/under-17/pssi-2j8i
    Explore at:
    csv, application/rdfxml, json, tsv, application/rssxml, xmlAvailable download formats
    Dataset updated
    Sep 10, 2019
    Authors
    New York State Department of Health
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed. A downloadable file of this dataset is available at: https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/mpue-vn67. For more information, including changes to the data from previous years, please visit http://www.health.ny.gov/statistics/sparcs/access/.

  18. f

    A comparison of four de-identification policies for the case study on...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Ellen Wright Clayton; Murat Kantarcioglu; Ranjit Ganta; Raymond Heatherly; Bradley A. Malin (2023). A comparison of four de-identification policies for the case study on performance measures. [Dataset]. http://doi.org/10.1371/journal.pone.0120592.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Ellen Wright Clayton; Murat Kantarcioglu; Ranjit Ganta; Raymond Heatherly; Bradley A. Malin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SH: Safe Harbor. GI: Generalization Intensity.A comparison of four de-identification policies for the case study on performance measures.

  19. p

    MIMIC-IV-ED

    • physionet.org
    Updated Jun 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Leo Anthony Celi; Roger Mark; Steven Horng (2021). MIMIC-IV-ED [Dataset]. http://doi.org/10.13026/77z6-9w59
    Explore at:
    Dataset updated
    Jun 3, 2021
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Leo Anthony Celi; Roger Mark; Steven Horng
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-ED is a large, freely available database of emergency department (ED) admissions at the Beth Israel Deaconess Medical Center between 2011 and 2019. As of MIMIC-ED v1.0, the database contains 448,972 ED stays. Vital signs, triage information, medication reconciliation, medication administration, and discharge diagnoses are available. All data are deidentified to comply with the Health Information Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-ED is intended to support a diverse range of education initiatives and research studies.

  20. S

    hospital14

    • health.data.ny.gov
    application/rdfxml +5
    Updated Sep 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Health (2019). hospital14 [Dataset]. https://health.data.ny.gov/Health/hospital14/wdtv-ip3j
    Explore at:
    application/rdfxml, application/rssxml, xml, csv, tsv, jsonAvailable download formats
    Dataset updated
    Sep 10, 2019
    Authors
    New York State Department of Health
    Description

    The Statewide Planning and Research Cooperative System (SPARCS) Inpatient De-identified File contains discharge level detail on patient characteristics, diagnoses, treatments, services, and charges. This data file contains basic record level detail for the discharge. The de-identified data file does not contain data that is protected health information (PHI) under HIPAA. The health information is not individually identifiable; all data elements considered identifiable have been redacted. For example, the direct identifiers regarding a date have the day and month portion of the date removed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin (2023). The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be removed or generalized for a data set to be considered de-identified (see 45 CFR 164.514(b)(2)(i)). [Dataset]. http://doi.org/10.1371/journal.pone.0028071.t001

The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be removed or generalized for a data set to be considered de-identified (see 45 CFR 164.514(b)(2)(i)).

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The 18 elements in the HIPAA Privacy Rule Safe Harbor standard that must be removed or generalized for a data set to be considered de-identified (see 45 CFR 164.514(b)(2)(i)).

Search
Clear search
Close search
Google apps
Main menu