15 datasets found
  1. Data De-identification Software Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data De-identification Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-software-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data De-identification Software Market Outlook



    The global data de-identification software market size was valued at approximately USD 500 million in 2023 and is projected to reach around USD 1.5 billion by 2032, growing at a CAGR of 13.5% during the forecast period. The growth in this market is driven by the increasing need for data privacy and compliance with stringent regulatory requirements across various industries.



    The primary growth factor for the data de-identification software market is the rising awareness and concern regarding data privacy and security. With the advent of big data and the proliferation of digital services, organizations are increasingly recognizing the importance of protecting personal and sensitive information. Data breaches and cyber-attacks have led to significant financial and reputational damages, prompting businesses to invest in advanced data de-identification solutions to mitigate risks. Moreover, regulatory frameworks such as GDPR in Europe, CCPA in California, and HIPAA in the United States mandate strict compliance measures for data privacy, further propelling the demand for these software solutions.



    Another significant driver is the growing adoption of cloud-based services and data analytics. As organizations migrate their data to cloud platforms, the need for robust data protection mechanisms becomes paramount. De-identification software enables companies to anonymize sensitive information before storing it in the cloud, ensuring compliance with data protection regulations and reducing the risk of exposure. Additionally, the rise of data analytics for business intelligence and decision-making necessitates the use of de-identified data to maintain privacy while extracting valuable insights.



    The healthcare sector is particularly noteworthy for its substantial contribution to the market growth. The industry deals with large volumes of sensitive patient information that must be protected from unauthorized access. Data de-identification software plays a crucial role in enabling healthcare providers to share and analyze patient data for research and treatment purposes without compromising privacy. The COVID-19 pandemic has further accelerated the adoption of digital health solutions, increasing the demand for data de-identification tools to ensure compliance with privacy regulations and maintain patient trust.



    Data Masking Technology is becoming increasingly vital as organizations strive to protect sensitive information while maintaining data utility. This technology allows businesses to create a realistic but fictional version of their data, ensuring that sensitive information is not exposed during processes such as software testing, development, and analytics. By substituting sensitive data with anonymized values, data masking technology helps organizations comply with data protection regulations without hindering their operational efficiency. As data privacy concerns continue to rise, the adoption of data masking technology is expected to grow, offering a robust solution for safeguarding sensitive information across various sectors.



    Regionally, North America holds a significant share of the data de-identification software market, driven by the presence of key market players, stringent regulatory requirements, and a high level of digitalization across industries. The Asia Pacific region is expected to witness the fastest growth during the forecast period, attributed to the rapid adoption of digital technologies, increasing awareness of data privacy, and evolving regulatory landscape in countries like China, Japan, and India. Europe also plays a vital role due to the stringent data protection regulations enforced by the GDPR, which mandates rigorous data de-identification practices.



    Component Analysis



    By component, the data de-identification software market is segmented into software and services. The software segment is anticipated to dominate the market, driven by the increasing demand for advanced de-identification tools that can handle large volumes of data efficiently. Organizations are investing in sophisticated software solutions that offer automated and customizable de-identification processes to meet specific compliance requirements. These software solutions often come with features like encryption, tokenization, and data masking, enhancing their appeal to businesses across different sectors.



    <a href="https://dataintelo.com/report/data-masking-

  2. Data De-Identification or Pseudonymity Software Market Report | Global...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data De-Identification or Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-or-pseudonymity-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data De-Identification or Pseudonymity Software Market Outlook



    As of 2023, the global Data De-Identification or Pseudonymity Software market is valued at approximately USD 1.5 billion and is projected to grow at a robust CAGR of 18% from 2024 to 2032, driven by increasing data privacy concerns and stringent regulatory requirements.



    The growth of the Data De-Identification or Pseudonymity Software market is primarily fueled by the exponential increase in data generation across industries. With the advent of IoT, AI, and digital transformation strategies, the volume of data generated has seen an unprecedented spike. Organizations are now more aware of the need to protect sensitive information to comply with global data privacy regulations such as GDPR in Europe and CCPA in California. The need to ensure that personal data is anonymized or de-identified before analysis or sharing has escalated, pushing the demand for these software solutions.



    Another significant growth factor is the rising number of cyber-attacks and data breaches. As data becomes more valuable, it also becomes a prime target for cybercriminals. In response, companies are investing heavily in data privacy and security measures, including de-identification and pseudonymity solutions, to mitigate risks associated with data breaches. This trend is more prevalent in sectors dealing with highly sensitive information like healthcare, finance, and government. Ensuring that data remains secure and private while being useful for analytics is a key driver for the adoption of these technologies.



    Moreover, the evolution of Big Data analytics and cloud computing is also spurring growth in this market. As organizations move their operations to the cloud and leverage big data for decision-making, the importance of maintaining data privacy while utilizing large datasets for analytics cannot be overstated. Cloud-based de-identification solutions offer scalability, flexibility, and cost-effectiveness, making them increasingly popular among enterprises of all sizes. This shift towards cloud deployments is expected to further boost market growth.



    Regionally, North America holds the largest market share due to its advanced technological infrastructure and stringent data protection laws. The presence of major technology companies and a high rate of adoption of advanced solutions in the U.S. and Canada contribute significantly to regional market growth. Europe follows closely, driven by rigorous GDPR compliance requirements. The Asia Pacific region is anticipated to witness the fastest growth, attributed to the increasing digitization and growing awareness about data privacy in countries like India and China.



    As organizations increasingly seek to protect their sensitive data, the concept of Data Protection on Demand is gaining traction. This model allows businesses to access data protection services as and when needed, providing flexibility and scalability. By leveraging cloud-based platforms, companies can implement robust data protection measures without the need for significant upfront investments in infrastructure. This approach not only ensures compliance with data privacy regulations but also offers a cost-effective solution for managing data security. As the demand for on-demand services continues to rise, Data Protection on Demand is poised to become a critical component of data management strategies across various industries.



    Component Analysis



    The Data De-Identification or Pseudonymity Software market by component is segmented into software and services. The software segment dominates the market, driven by the increasing need for automated solutions that ensure data privacy. These software solutions come with a variety of tools and features designed to anonymize or pseudonymize data efficiently, making them essential for organizations managing large volumes of sensitive information. The software market is expanding rapidly, with new innovations and improvements constantly being introduced to enhance functionality and user experience.



    The services segment, though smaller compared to software, plays a crucial role in the market. Services include consulting, implementation, and maintenance, which are essential for the successful deployment and operation of de-identification software. These services help organizations tailor the software to their specific needs, ensuring compliance with regional and industry-specific data protection regulations.

  3. D

    Data De-identification & Pseudonymity Software Market Report | Global...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data De-identification & Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-pseudonymity-software-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data De-identification & Pseudonymity Software Market Outlook




    The global Data De-identification & Pseudonymity Software Market is projected to reach USD 3.5 billion by 2032, growing at a CAGR of 15.2% from 2024 to 2032. The rise in data privacy regulations and the increasing need for securing sensitive information are key factors driving this growth.




    The accelerating pace of digital transformation across various industries has led to an unprecedented surge in data generation. This voluminous data often contains sensitive information that needs robust protection. The growing awareness regarding data privacy and stringent regulations like GDPR in Europe, CCPA in California, and other data protection laws worldwide are compelling organizations to adopt advanced data de-identification and pseudonymity software. These solutions ensure that sensitive data is anonymized or pseudonymized, thus mitigating the risk of data breaches and ensuring compliance with regulations. Consequently, the adoption of data de-identification and pseudonymity software is rapidly increasing.




    Another significant growth factor is the increased focus on data security by industries such as healthcare, finance, and government. In healthcare, the protection of patient data is paramount, making the industry a significant consumer of de-identification software. Similarly, in the finance sector, protecting customer information is crucial to maintain trust and comply with regulatory requirements. Government agencies dealing with citizen data are also increasingly investing in these technologies to prevent unauthorized access and misuse of sensitive information. The demand for data de-identification and pseudonymity software is thus witnessing a steady rise across these critical sectors.




    Technological advancements and innovation in data security solutions are further propelling market growth. The integration of artificial intelligence and machine learning into de-identification and pseudonymity software has enhanced their effectiveness and efficiency. These advanced technologies enable more accurate and faster processing of large datasets, thereby offering robust data protection. Additionally, the rise of cloud computing and the increasing adoption of cloud-based solutions provide scalable and cost-effective options for organizations, further driving the market.



    In this context, the role of Identity Information Protection Service becomes increasingly crucial. As organizations strive to safeguard sensitive data, these services provide an essential layer of security by ensuring that identity-related information is protected from unauthorized access and misuse. Identity Information Protection Service helps organizations comply with data privacy regulations by offering robust solutions that secure personal identifiers, thus reducing the risk of identity theft and data breaches. By integrating these services, companies can enhance their data protection strategies, ensuring that identity information remains confidential and secure across various platforms and applications.




    Regionally, North America holds the largest market share, driven by stringent data protection regulations and high adoption rates of advanced technologies. Europe follows, with significant contributions from countries like Germany, the UK, and France, driven by GDPR compliance requirements. The Asia Pacific region is expected to witness the highest growth rate due to the rapid digitalization of economies like China and India, coupled with increasing awareness about data privacy. Latin America and the Middle East & Africa regions are also showing promising growth, albeit from a smaller base.



    Component Analysis




    The Data De-identification & Pseudonymity Software Market by component is segmented into software and services. The software segment includes standalone software solutions designed to de-identify or pseudonymize data. This segment is witnessing substantial growth due to the increasing demand for automated and scalable data protection solutions. The software solutions are enhanced with advanced algorithms and AI capabilities, providing accurate de-identification and pseudonymization of large datasets, which is crucial for organizations dealing with massive amounts of sensitive data.




  4. t

    Data from: Trusted Research Environments: Analysis of Characteristics and...

    • researchdata.tuwien.ac.at
    bin, csv
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.

    Methodology

    We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

    • Peer-reviewed articles where available,
    • TRE websites,
    • TRE metadata catalogs.

    The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.

    Technical details

    This dataset consists of five comma-separated values (.csv) files describing our inventory:

    • countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
    • tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
    • access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
    • inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
    • major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).

    Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

    • schema.sql: Schema definition file to create the tables and views used in the analysis.

    The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

  5. h

    Optimum Patient Care Research Database (OPCRD)

    • healthdatagateway.org
    unknown
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Optimum Patient Care (OPC) (2024). Optimum Patient Care Research Database (OPCRD) [Dataset]. http://doi.org/10.2147/POR.S395632
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Sep 8, 2024
    Dataset provided by
    Optimum Patient Care Limited
    Authors
    Optimum Patient Care (OPC)
    License

    https://opcrd.co.uk/our-database/data-requests/https://opcrd.co.uk/our-database/data-requests/

    Description

    About OPCRD

    Optimum Patient Care Research Database (OPCRD) is a real-world, longitudinal, research database that provides anonymised data to support scientific, medical, public health and exploratory research. OPCRD is established, funded and maintained by Optimum Patient Care Limited (OPC) – which is a not-for-profit social enterprise that has been providing quality improvement programmes and research support services to general practices across the UK since 2005.

    Key Features of OPCRD

    OPCRD has been purposefully designed to facilitate real-world data collection and address the growing demand for observational and pragmatic medical research, both in the UK and internationally. Data held in OPCRD is representative of routine clinical care and thus enables the study of ‘real-world’ effectiveness and health care utilisation patterns for chronic health conditions.

    OPCRD unique qualities which set it apart from other research data resources: • De-identified electronic medical records of more than 24.9 million patients • OPCRD covers all major UK primary care clinical systems • OPCRD covers approximately 35% of the UK population • One of the biggest primary care research networks in the world, with over 1,175 practices • Linked patient reported outcomes for over 68,000 patients including Covid-19 patient reported data • Linkage to secondary care data sources including Hospital Episode Statistics (HES)

    Data Available in OPCRD

    OPCRD has received data contributions from over 1,175 practices and currently holds de-identified research ready data for over 24.9 million patients or data subjects. This includes longitudinal primary care patient data and any data relevant to the management of patients in primary care, and thus covers all conditions. The data is derived from both electronic health records (EHR) data and patient reported data from patient questionnaires delivered as part of quality improvement. OPCRD currently holds over 68,000 patient reported questionnaire data on Covid-19, asthma, COPD and rare diseases.

    Approvals and Governance

    OPCRD has NHS research ethics committee (REC) approval to provide anonymised data for scientific and medical research since 2010, with its most recent approval in 2020 (NHS HRA REC ref: 20/EM/0148). OPCRD is governed by the Anonymised Data Ethics and Protocols Transparency committee (ADEPT). All research conducted using anonymised data from OPCRD must gain prior approval from ADEPT. Proceeds from OPCRD data access fees and detailed feasibility assessments are re-invested into OPC services for the continued free provision of patient quality improvement programmes for contributing practices and patients.

    For more information on OPCRD please visit: https://opcrd.co.uk/

  6. Data from: MOESM1 of Legal and ethical framework for global health...

    • springernature.figshare.com
    xlsx
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Bernasconi; Selçuk Şen; Luca Angerame; Apolo Balyegisawa; Damien Hong Yew Hui; Maximilian Hotter; Chung Hsu; Tatsuya Ito; Francisca Jörger; Wolfgang Krassnitzer; Adam Phillips; Rui Li; Louise Stockley; Fabian Tay; Charlotte Heijne Widlund; Ming Wan; Creany Wong; Henry Yau; Thomas Hiemstra; Yagiz Uresin; Gabriela Senti (2024). MOESM1 of Legal and ethical framework for global health information and biospecimen exchange - an international perspective [Dataset]. http://doi.org/10.6084/m9.figshare.11686464.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lara Bernasconi; Selçuk Şen; Luca Angerame; Apolo Balyegisawa; Damien Hong Yew Hui; Maximilian Hotter; Chung Hsu; Tatsuya Ito; Francisca Jörger; Wolfgang Krassnitzer; Adam Phillips; Rui Li; Louise Stockley; Fabian Tay; Charlotte Heijne Widlund; Ming Wan; Creany Wong; Henry Yau; Thomas Hiemstra; Yagiz Uresin; Gabriela Senti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. List of definitions: the Excel file “List of definitions_data protection.xlsx” includes legal definitions for the terms “personal data”, “anonymized”, “de-identified”, “pseudonymized” and “encrypted”, as provided by the participating ICN countries/regions.

  7. Amazon Seller Contact Intent Sequence

    • registry.opendata.aws
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon (2023). Amazon Seller Contact Intent Sequence [Dataset]. https://registry.opendata.aws/amazon-seller-contact-intent-sequence/
    Explore at:
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Amazon.comhttp://amazon.com/
    Description

    When sellers need help from Amazon, such as how to create a listing, they often reach out to Amazon seller support through email, chat or phone. For each contact, we assign an intent so that we can manage the request more easily. The data we present in this release includes 548k contacts with 118 intents from 70k sellers sampled from recent years. There are 3 columns. 1. De-identified seller id - seller_id_anon; 2. Noisy inter-arrival time in the unit of hour between contacts - interarrival_time_hr_noisy; 3. An integer that represents the contact intent - contact_intent. Note that, to balance the need between data anonymization and usefulness, we randomly perturbed the interarrival time in an intricate way such that the temporal pattern are preserved and seller identity are anonymized to the largest extent. We also note that for each seller_id_anon, the interarrival_time_hr_noisy are already arranged in chonological order, the first contact_intent_id_anon is always the origin when sellers begin to sell with us and the interarrival_time_hr_noisy for each seller_id_anon are all relative with respect to the previous contact. A straightforward use case of the data is to predict the next timestamp and intent of a user given the user's history.

  8. d

    In-vitro efficacy of fluoroquinolones and carbapenems against...

    • datadryad.org
    zip
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aakash Parajuli; Shila Shrestha; Shiba Kumar Rai (2025). In-vitro efficacy of fluoroquinolones and carbapenems against biofilm-forming and non-forming non-fermenting gram-negative bacteria isolated from clinical specimens [Dataset]. http://doi.org/10.5061/dryad.ghx3ffc1f
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Dryad
    Authors
    Aakash Parajuli; Shila Shrestha; Shiba Kumar Rai
    Description

    In-vitro efficacy of fluoroquinolones and carbapenems against biofilm-forming and non-forming non-fermenting gram-negative bacteria isolated from clinical specimens

    Dataset DOI: 10.5061/dryad.ghx3ffc1f

    Description of the data and file structure

    Comparative In-Vitro Efficacy of Fluoroquinolones and Carbapenems among Biofilm-Forming and Non-Forming Non-Fermenters Isolated from Clinical Specimens

    The dataset is of hospital-visiting individuals with infection due to non-fermenter bacteria, i.e., Acinetobacter calcoaceticus-baumanii complex and Pseudomonas aeruginosa.

    The dataset comprises of single sheet. The sheet details for demographic information, such as age group and gender of the infected patients; clinical information, including clinical samples; microbiological findings comprising bacterial genera, antimicrobial resistance patterns, biofilm formers or non-formers, inhibitory concentrations of fluoroquinolones (norfloxacin, ciprofloxacin, ofl...

  9. D

    Updated PTSS dataset for the FORAS project

    • dataverse.nl
    csv, docx, xlsx
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Coimbra; Bruno Coimbra; Rutger Neeleman; Rutger Neeleman; Elizabeth Grandfield; Elizabeth Grandfield; Mirjam van Zuiden; Mirjam van Zuiden; Rens van de Schoot; Rens van de Schoot (2025). Updated PTSS dataset for the FORAS project [Dataset]. http://doi.org/10.34894/CRE6ZC
    Explore at:
    docx(48426), xlsx(9398219), csv(21840732), xlsx(1199186)Available download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    DataverseNL
    Authors
    Bruno Coimbra; Bruno Coimbra; Rutger Neeleman; Rutger Neeleman; Elizabeth Grandfield; Elizabeth Grandfield; Mirjam van Zuiden; Mirjam van Zuiden; Rens van de Schoot; Rens van de Schoot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Dutch Research Council
    Description

    This updated labeled dataset builds upon the initial systematic review by van de Schoot et al. (2018; DOI: 10.1080/00273171.2017.1412293), which included studies on post-traumatic stress symptom (PTSS) trajectories up to 2016, sourced from the Open Science Framework (OSF). As part of the FORAS project - Framework for PTSS trajectORies: Analysis and Synthesis (funded by the Dutch Research Council, grant no. 406.22.GO.048 and pre-registered at PROSPERO under ID CRD42023494027), we extended this dataset to include publications between 2016 and 2023. In total, the search identified 10,594 de-duplicated records obtained via different search methods, each published with their own search query and result: Exact replication of the initial search: OSF.IO/QABW3 Comprehensive database search: OSF.IO/D3UV5 Snowballing: OSF.IO/M32TS Full-text search via Dimensions data: OSF.IO/7EXC5 Semantic search via OpenAlex: OSF.IO/M32TS Humans (BC, RN) and AI (Bron et al., 2024) have screened the records, and disagreements have been solved (MvZ, BG, RvdS). Each record was screened separately for Title, Abstract, and Full-text inclusion and per inclusion criteria. A detailed screening logbook is available at OSF.IO/B9GD3, and the entire process is described in https://doi.org/10.31234/osf.io/p4xm5. A description of all columns/variables and full methodological details is available in the accompanying codebook. Important Notes: Duplicates: To maintain consistency and transparency, duplicates are left in the dataset and are labeled with the same classification as the original records. A filter is provided to allow users to exclude these duplicates as needed. Anonymized Data: The dataset "...._anonymous" excludes DOIs, OpenAlex IDs, titles, and abstracts to ensure data anonymization during the review process. The complete dataset, including all identifiers, is uploaded under embargo and will be publicly available on 01-10-2025. This dataset serves not only as a valuable resource for researchers interested in systematic reviews of PTSS trajectories and facilitates reproducibility and transparency in the research process but also for data scientists who would like to mimic the screening process using different machine learning and AI models.

  10. f

    Supplementary Data: Patient Level Data from IVIg Use Associated with...

    • aacr.figshare.com
    xlsx
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guido Lancman; Kian Parsa; Krzysztof Kotlarz; Lisa Avery; Alaina Lurie; Alex Lieberman-Cribbin; Hearn Jay Cho; Samir S. Parekh; Shambavi Richard; Joshua Richter; Cesar Rodriguez; Adriana Rossi; Larysa J. Sanchez; Santiago Thibaud; Sundar Jagannath; Ajai Chari (2023). Supplementary Data: Patient Level Data from IVIg Use Associated with Ten-Fold Reduction of Serious Infections in Multiple Myeloma Patients Treated with Anti-BCMA Bispecific Antibodies [Dataset]. http://doi.org/10.1158/2643-3230.24473944.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    American Association for Cancer Research
    Authors
    Guido Lancman; Kian Parsa; Krzysztof Kotlarz; Lisa Avery; Alaina Lurie; Alex Lieberman-Cribbin; Hearn Jay Cho; Samir S. Parekh; Shambavi Richard; Joshua Richter; Cesar Rodriguez; Adriana Rossi; Larysa J. Sanchez; Santiago Thibaud; Sundar Jagannath; Ajai Chari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anonymized patient-level data. Drug names have been de-identified and will be made available upon request once all of the sponsors have published results from these trials.

  11. Z

    Pain Interventions in Dementia - Pain events dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koppitz Andrea (2023). Pain Interventions in Dementia - Pain events dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6359399
    Explore at:
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    Volken Thomas
    Koppitz Andrea
    Spichiger Frank
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is data collected for a quasi-experimental nurse-led intervention trial based on a convenience sample of three nursing homes. It was collected in the Swiss Canton of Zurich and Thurgau and serves to examine the effects on dementia patients, the healthcare institution, and the qualification level of the healthcare workers using an event analysis and a multilevel analysis. Healthcare workers have been individually trained on how to assess, intervene and evaluate acute and chronic pain with BESD and/or VAS. There are three data-monitoring cycles (T0, T1, T2) and two intervention cycles (I1, I2) with a total study duration of 425 days. The raw data has been cryptographically anonymized using an SSL stream and further de-identification techniques.

    Also see: 10.1186/s12904-017-0200-5

  12. P

    Data from: RadCases Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael S. Yao; Allison Chae; Charles E. Kahn Jr.; Walter R. Witschey; James C. Gee; Hersh Sagreiya; Osbert Bastani (2024). RadCases Dataset [Dataset]. https://paperswithcode.com/dataset/radcases
    Explore at:
    Dataset updated
    Sep 26, 2024
    Authors
    Michael S. Yao; Allison Chae; Charles E. Kahn Jr.; Walter R. Witschey; James C. Gee; Hersh Sagreiya; Osbert Bastani
    Description

    RadCases Dataset This HuggingFace (HF) dataset contains the raw case labels for input patient "one-liner" case summaries according to the ACR Appropriateness Criteria. Because many of the sources of data used to construct the RadCases dataset require credentialed access, we cannot publicly release the input patient case summaries. Instead, the "cases" included in this publicly available dataset are the cryptographically secure SHA-512 hashes of the original, "human-readable" cases. In this way, the hashes cannot be used to reconstruct the original RadCases dataset, but can instead be used as a lookup key to determine the ground-truth label for the dataset.

    Setup Prior to using this dataset, you need to download the raw source of patient one-liners first in compliance with each of the source-specific licenses and data usage agreements. The setup process is different for each of the different dataset sources:

    Synthetic: The Synthetic dataset is composed of patient one-liners synthetically generated by OpenAI's ChatGPT. You can find the raw dataset at this GitHub link. No additional setup steps are required for the Synthetic RadCases dataset. USMLE: The USMLE dataset is comprised of practice USMLE Step- 2 and 3 cases from Medbullets that are made available by Chen et al. (2024). The dataset is made publicly available by the cited authors at this GitHub link - we extract the first sentence of each question stem to use as an input patient one-liner in the RadCases dataset. JAMA: The JAMA dataset is comprised of challenging patient one-liners derived from the JAMA Clinical Challenges from the Journal of the American Medical Association (JAMA). Please follow the instructions from @HanjieChen here to first download the dataset. We extract the first sentence of each clinical challenge to use as the input patient one-liner in the RadCases dataset. NEJM: The NEJM dataset is comprised of challenging patient one-liners derived from the NEJM Case Records of the Massachusetts General Hospital from the New England Journal of Medicine (NEJM). We provide a script build_nejm_dataset.py to scrape the case records from the DOIs listed here, which are the same as those used by Savage et al. (2024).. The resulting nejm.jsonl file generated by the script should then be added to the radGPT home directory. BIDMC: The Beth Israel Deaconess Medical Center (BIDMC) dataset is comprised of real anonymized, de-identified patient one-liners derived from the MIMIC-IV Dataset. Please request access to the MIMIC-IV dataset here. The discharge.csv.gz file should then be added to the radGPT/radgpt/data directory.

    Dataset Structure Each row of the dataset is a (SHA-512 hash of a) patient "one-liner" case mapping to an ACR Appropriateness Criteria topic, and also the parent panel of that topic.

    case: the SHA-512 hash of the patient one-liner panel: the ACR Appropriateness Criteria panel label of the patient one-liner topic: the ACR Appropriateness Criteria topic label of the patient one-liner

    Retrieving A Label To retrieve a ground-truth ACR label from this dataset, you can use the following source code:

    import hashlib
    
    prompt = input("Patient One-Liner Case: ")
    hash_gen = hashlib.sha512()
    hash_gen.update(prompt.encode())
    hash_val = str(hash_gen.hexdigest())
    

    The corresponding hash_val variable can then be used to lookup the corresponding panel or topic by matching hash_val with the case value in the RadCases dataset.

    Direct Dataset Usage You can download the contents of this dataset using the following terminal command:

    git clone https://huggingface.co/datasets/michaelsyao/RadCases

  13. d

    Data for: An instantaneous voice synthesis neuroprosthesis

    • datadryad.org
    zip
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maitreyee Wairagkar; Nicholas Card; Tyler Singer-Clark; Xianda Hou; Carrina Iacobacci; Lee Miller; Leigh Hochberg; David Brandman; Sergey Stavisky (2025). Data for: An instantaneous voice synthesis neuroprosthesis [Dataset]. http://doi.org/10.5061/dryad.2280gb64f
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 14, 2025
    Dataset provided by
    Dryad
    Authors
    Maitreyee Wairagkar; Nicholas Card; Tyler Singer-Clark; Xianda Hou; Carrina Iacobacci; Lee Miller; Leigh Hochberg; David Brandman; Sergey Stavisky
    Description

    Data for: An instantaneous voice synthesis neuroprosthesis

    An instantaneous voice synthesis neuroprosthesis

    Maitreyee Wairagkar, Nicholas S. Card, Tyler Singer-Clark, Xianda Hou, Carrina Iacobacci, Lee M. Miller, Leigh R. Hochberg, David M. Brandman#, Sergey D. Stavisky#

    # Co-senior authors

    preprint: https://doi.org/10.1101/2024.08.14.607690

    Overview

    This repository contains the neural data recorded during speech tasks described in Wairagkar et al., “An instantaneous voice synthesis neuroprosthesis” (see Related works) and associated metadata (e.g., task identifier, task event times, what the prompted text was, behavioral measurements).

    The participant was instructed to attempt to speak the sentences cued on screen in front of him at his own pace. The data are segmented into individual trials of “go” period where the participant attempted to speak each sentence. Data are organized into bl...

  14. SIMPATICO Second Evaluation Galicia Dataset v1.0

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raúl Santos de la Cámara; Diego López-de-Ipiña; Diego López-de-Ipiña; Koldo Zabaleta; Pablo Aubert; Enrique Sanz; Raúl Santos de la Cámara; Koldo Zabaleta; Pablo Aubert; Enrique Sanz (2020). SIMPATICO Second Evaluation Galicia Dataset v1.0 [Dataset]. http://doi.org/10.5281/zenodo.2244751
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Raúl Santos de la Cámara; Diego López-de-Ipiña; Diego López-de-Ipiña; Koldo Zabaleta; Pablo Aubert; Enrique Sanz; Raúl Santos de la Cámara; Koldo Zabaleta; Pablo Aubert; Enrique Sanz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SIMPATICO logs for the user evaluation of Galicia in project iteration 2

    The current package contains the Interaction LOG data captured in the Galicia evaluation of the results of H2020 project SIMPATICO that were undertaken between September 24th 2018 and October 15th 2018. This contains a total 290 user tests that were conducted. The data is exported from the Elasticsearch instance that was used to log all of the interaction data. The data model for this can be found in project deliverable "D3.3 Advanced Methods And Tools For User Interaction Automation". For more information about the setup for conducting the tests and the results achieved please consult project deliverable "D6.6 SIMPATICO Evaluation Report v2". All project deliverables, except where noted, are public and are available at Zenodo community reachable at https://zenodo.org/communities/h2020-simpatico-692819.

    The following caveats need to be highlighted for this data set:
    - Due to limitations in the dumping mechanisms in the Elasticsearch it has been divided in two different sub-logs (24th September to 7th October, 8th October to 15th November). All of the registers in the data set are nonetheless identified by date and time so the data set as a whole is a continuous recount of the captured data.

    - The format is JSON (Javascript objects) as provided by Elasticsearch.

    - Data is completely anonymized: no traces of personal data for any of the 374 participants can be found in this file. Individual user logs can be traced from the "userID" field that is stored, containing either a unique identifier that is backed to a logged-in user (in the cases in which just a number is stored) or a user who is interacting but has not yet logged (this is signified by the "no_user_logged_" prefix, followed by another unique identifier that can trace users interacting before login).

    Change Log

    v1.0 - 2018-12-13 - Initial release

    Disclaimer: this data set was created by the consortium of project SIMPATICO (GA 692819, http://www.simpatico-project.eu). The data is provided here as-is with no liability due to the authors for its completeness or validity. Recipients of this data can freely use it in any form following due contact with the project administration at info@simpatico-project.eu).

    (c) SIMPATICO - 692819 This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement number 692819.

  15. f

    Participants anonymised data.

    • figshare.com
    xlsx
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narendran Gopalan; Vinod Kumar Viswanathan; Vignes Anand Srinivasalu; Saranya Arumugam; Adhin Bhaskar; Tamizhselvan Manoharan; Santosh Kishor Chandrasekar; Divya Bujagaruban; Ramya Arumugham; Gopi Jagadeeswaran; Saravanan Madurai Pandian; Arunalatha Ponniah; Thirumaran Senguttuvan; Ponnuraja Chinnaiyan; Baskaran Dhanraj; Vineet Kumar Chadha; Balaji Purushotham; Manoj Vasanth Murhekar (2025). Participants anonymised data. [Dataset]. http://doi.org/10.1371/journal.pone.0312993.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Narendran Gopalan; Vinod Kumar Viswanathan; Vignes Anand Srinivasalu; Saranya Arumugam; Adhin Bhaskar; Tamizhselvan Manoharan; Santosh Kishor Chandrasekar; Divya Bujagaruban; Ramya Arumugham; Gopi Jagadeeswaran; Saravanan Madurai Pandian; Arunalatha Ponniah; Thirumaran Senguttuvan; Ponnuraja Chinnaiyan; Baskaran Dhanraj; Vineet Kumar Chadha; Balaji Purushotham; Manoj Vasanth Murhekar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anonymised data attached as an MS excel format. (XLSX)

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dataintelo (2025). Data De-identification Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-software-market
Organization logo

Data De-identification Software Market Report | Global Forecast From 2025 To 2033

Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered
2024 - 2032
Area covered
Global
Description

Data De-identification Software Market Outlook



The global data de-identification software market size was valued at approximately USD 500 million in 2023 and is projected to reach around USD 1.5 billion by 2032, growing at a CAGR of 13.5% during the forecast period. The growth in this market is driven by the increasing need for data privacy and compliance with stringent regulatory requirements across various industries.



The primary growth factor for the data de-identification software market is the rising awareness and concern regarding data privacy and security. With the advent of big data and the proliferation of digital services, organizations are increasingly recognizing the importance of protecting personal and sensitive information. Data breaches and cyber-attacks have led to significant financial and reputational damages, prompting businesses to invest in advanced data de-identification solutions to mitigate risks. Moreover, regulatory frameworks such as GDPR in Europe, CCPA in California, and HIPAA in the United States mandate strict compliance measures for data privacy, further propelling the demand for these software solutions.



Another significant driver is the growing adoption of cloud-based services and data analytics. As organizations migrate their data to cloud platforms, the need for robust data protection mechanisms becomes paramount. De-identification software enables companies to anonymize sensitive information before storing it in the cloud, ensuring compliance with data protection regulations and reducing the risk of exposure. Additionally, the rise of data analytics for business intelligence and decision-making necessitates the use of de-identified data to maintain privacy while extracting valuable insights.



The healthcare sector is particularly noteworthy for its substantial contribution to the market growth. The industry deals with large volumes of sensitive patient information that must be protected from unauthorized access. Data de-identification software plays a crucial role in enabling healthcare providers to share and analyze patient data for research and treatment purposes without compromising privacy. The COVID-19 pandemic has further accelerated the adoption of digital health solutions, increasing the demand for data de-identification tools to ensure compliance with privacy regulations and maintain patient trust.



Data Masking Technology is becoming increasingly vital as organizations strive to protect sensitive information while maintaining data utility. This technology allows businesses to create a realistic but fictional version of their data, ensuring that sensitive information is not exposed during processes such as software testing, development, and analytics. By substituting sensitive data with anonymized values, data masking technology helps organizations comply with data protection regulations without hindering their operational efficiency. As data privacy concerns continue to rise, the adoption of data masking technology is expected to grow, offering a robust solution for safeguarding sensitive information across various sectors.



Regionally, North America holds a significant share of the data de-identification software market, driven by the presence of key market players, stringent regulatory requirements, and a high level of digitalization across industries. The Asia Pacific region is expected to witness the fastest growth during the forecast period, attributed to the rapid adoption of digital technologies, increasing awareness of data privacy, and evolving regulatory landscape in countries like China, Japan, and India. Europe also plays a vital role due to the stringent data protection regulations enforced by the GDPR, which mandates rigorous data de-identification practices.



Component Analysis



By component, the data de-identification software market is segmented into software and services. The software segment is anticipated to dominate the market, driven by the increasing demand for advanced de-identification tools that can handle large volumes of data efficiently. Organizations are investing in sophisticated software solutions that offer automated and customizable de-identification processes to meet specific compliance requirements. These software solutions often come with features like encryption, tokenization, and data masking, enhancing their appeal to businesses across different sectors.



<a href="https://dataintelo.com/report/data-masking-

Search
Clear search
Close search
Google apps
Main menu