25 datasets found
  1. Healthcare Data Anonymization Services Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Healthcare Data Anonymization Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/healthcare-data-anonymization-services-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Healthcare Data Anonymization Services Market Outlook



    According to our latest research, the global healthcare data anonymization services market size reached USD 1.42 billion in 2024, reflecting a robust expansion driven by increasing regulatory demands and heightened focus on patient privacy. The market is projected to grow at a CAGR of 15.8% from 2025 to 2033, with the total market value expected to reach USD 5.44 billion by 2033. This impressive growth trajectory is underpinned by the rising adoption of digital health solutions, stringent data protection laws, and the ongoing digitalization of healthcare records worldwide.




    The primary growth factor fueling the healthcare data anonymization services market is the proliferation of electronic health records (EHRs) and the expanding use of big data analytics in healthcare. As healthcare providers and organizations increasingly leverage advanced analytics for improving patient outcomes, there is a corresponding surge in data generation. However, these vast datasets often contain sensitive patient information, making data anonymization essential to ensure compliance with regulations such as HIPAA, GDPR, and other regional privacy laws. The increasing frequency of data breaches and cyberattacks has further highlighted the importance of robust anonymization services, prompting healthcare organizations to prioritize investments in data privacy and security solutions. As a result, demand for both software and service-based anonymization solutions continues to rise, contributing significantly to market growth.




    Another key driver for the healthcare data anonymization services market is the growing emphasis on research and clinical trials, which require the sharing and analysis of large volumes of patient data. Pharmaceutical and biotechnology companies, as well as research organizations, are increasingly collaborating across borders, necessitating the anonymization of datasets to protect patient identities and comply with international data protection standards. The adoption of cloud-based healthcare solutions has also facilitated the secure and efficient sharing of anonymized data, supporting advancements in personalized medicine and population health management. As organizations seek to balance innovation with compliance, the demand for advanced anonymization technologies that offer high accuracy and scalability is expected to accelerate further.




    Technological advancements in artificial intelligence (AI) and machine learning (ML) are also shaping the future of the healthcare data anonymization services market. These technologies are enabling more sophisticated and automated anonymization processes, reducing the risk of re-identification while maintaining data utility for research and analytics. The integration of AI-driven tools into anonymization workflows is helping organizations streamline operations, minimize human error, and achieve greater compliance with evolving regulatory requirements. Additionally, the increasing availability of customizable and interoperable anonymization solutions is making it easier for healthcare organizations of all sizes to adopt and scale these services, thereby broadening the market’s reach and impact.




    From a regional perspective, North America continues to dominate the healthcare data anonymization services market, accounting for the largest share in 2024. This leadership position is attributed to the presence of advanced healthcare infrastructure, widespread adoption of EHRs, and strict regulatory frameworks governing patient data privacy. Europe follows closely, driven by the enforcement of the General Data Protection Regulation (GDPR) and a strong culture of data protection. The Asia Pacific region is witnessing the fastest growth, propelled by increasing healthcare digitalization, government initiatives to modernize healthcare systems, and rising awareness of data privacy among patients and providers. Latin America and the Middle East & Africa are also experiencing steady growth, albeit from a smaller base, as healthcare organizations in these regions begin to prioritize data security and compliance.



    &

  2. f

    De-identification - anonymization

    • figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco H C Felix (2023). De-identification - anonymization [Dataset]. http://doi.org/10.6084/m9.figshare.3545471.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Authors
    Francisco H C Felix
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    De-identification, anonymization, pseudoanonymization, re-identificationNational Institute of Standards and Technology (NIST) documentation declares that the use of these terms is still unclear. Words de-identification, anonymizatio_ and pseudoanonymization are sometimes interchangeable, sometimes carrying subtle different meanings. To mitigate ambiguity, NIST use definitions from ISO/TS 25237:2008:> de-identification: “general term for any process of removing the association between a set of identifying data and the data subject.” [p. 3] anonymization: “process that removes the association between the identifying dataset and the data subject.” [p. 2] pseudonymization: “particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms.”1 [p. 5]Brazilian portuguese literature largely lacks this terminology, and they are more often used in law or information technology. The utilization of these concepts in health care and research has a specific conceptualization. HIPAA (Health Insurance Portability and Accountability Act), US regulation of health data privacy protection, establishes standards for patient personal information (protected health information - PHI) handling by health care providers (covered entities).

  3. h

    Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure...

    • heidata.uni-heidelberg.de
    pdf, tsv, txt
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich (2024). Anonymize or Synthesize? – Privacy-Preserving Methods for Heart Failure Score Analytics [data] [Dataset]. http://doi.org/10.11588/DATA/MXM0Q2
    Explore at:
    txt(3421), tsv(191831), tsv(106632), tsv(286102), tsv(107100), tsv(190296), tsv(197975), pdf(640128)Available download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    heiDATA
    Authors
    Tim Ingo Johann; Tim Ingo Johann; Karen Otte; Karen Otte; Fabian Prasser; Fabian Prasser; Christoph Dieterich; Christoph Dieterich
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/MXM0Q2

    Description

    In the publication [1] we implemented anonymization and synthetization techniques for a structured data set, which was collected during the HiGHmed Use Case Cardiology study [2]. We employed the data anonymization tool ARX [3] and the data synthetization framework ASyH [4] individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores (Barcelona BioHF [5] and MAGGIC [6]) on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats. We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We hereby share all generated data sets with the scientific community through a use and access agreement. [1] Johann TI, Otte K, Prasser F, Dieterich C: Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics. Eur Heart J 2024;. doi://10.1093/ehjdh/ztae083 [2] Sommer KK, Amr A, Bavendiek, Beierle F, Brunecker P, Dathe H et al. Structured, harmonized, and interoperable integration of clinical routine data to compute heart failure risk scores. Life (Basel) 2022;12:749. [3] Prasser F, Eicher J, Spengler H, Bild R, Kuhn KA. Flexible data anonymization using ARX—current status and challenges ahead. Softw Pract Exper 2020;50:1277–1304. [4] Johann TI, Wilhelmi H. ASyH—anonymous synthesizer for health data, GitHub, 2023. Available at: https://github.com/dieterich-lab/ASyH. [5] Lupón J, de Antonio M, Vila J, Peñafiel J, Galán A, Zamora E, et al. Development of a novel heart failure risk tool: the Barcelona bio-heart failure risk calculator (BCN Bio-HF calculator). PLoS One 2014;9:e85466. [6] Pocock SJ, Ariti CA, McMurray JJV, Maggioni A, Køber L, Squire IB, et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. Eur Heart J 2013;34:1404–1413.

  4. C

    Cloud Data Desensitization Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Cloud Data Desensitization Report [Dataset]. https://www.marketresearchforecast.com/reports/cloud-data-desensitization-30077
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The cloud data desensitization market is experiencing robust growth, driven by increasing concerns over data privacy regulations like GDPR and CCPA, coupled with the rising adoption of cloud computing. The market's expansion is fueled by the need to protect sensitive data across various sectors, including healthcare, finance, and government, while maintaining data usability for analytics and other business purposes. A compound annual growth rate (CAGR) of, let's conservatively estimate, 15% from 2025 to 2033 suggests a significant market opportunity. This growth is further propelled by the evolving sophistication of data masking and anonymization techniques, enabling organizations to effectively balance data security with operational efficiency. Key players are continuously innovating, introducing advanced solutions that cater to specific industry needs and comply with stringent regulatory requirements. The cloud deployment model dominates due to its scalability, cost-effectiveness, and ease of implementation compared to on-premise solutions. Segments within the market show varied growth trajectories. Medical research data desensitization is likely experiencing high growth due to the sensitive nature of patient information and increasing research collaborations. Financial risk assessment and government statistics segments are also witnessing strong adoption, driven by the need for robust data protection and compliance. While on-premise solutions still hold a market share, the cloud segment is projected to capture a larger portion in the coming years, reflecting the overall shift towards cloud-based infrastructure and services. Geographic distribution demonstrates a strong presence in North America and Europe, reflecting early adoption and stringent data protection regulations in these regions. However, growth is anticipated in Asia Pacific and other developing economies as cloud adoption and data privacy awareness increase.

  5. D

    Synthetic version of anonymized Norway Registry data containing...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavitra Chauhan; Pavitra Chauhan (2024). Synthetic version of anonymized Norway Registry data containing prescriptions and hospitalization of the patients [Dataset]. http://doi.org/10.18710/YABAGM
    Explore at:
    txt(8709), text/comma-separated-values(34718547)Available download formats
    Dataset updated
    Sep 5, 2024
    Dataset provided by
    DataverseNO
    Authors
    Pavitra Chauhan; Pavitra Chauhan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2011 - Jan 1, 2013
    Area covered
    Norway
    Description

    This dataset represents synthetic data derived from anonymized Norwegian Registry Data of pa aged 65 and above from 2011 to 2013. It includes the Norwegian Patient Registry (NPR), which contains hospitalization details, and the Norwegian Prescription Database (NorPD), which contains prescription details. The NPR and NorPD datasets are combined into a single CSV file. This real dataset was part of a project to study medication use in the elderly and its association with hospitalization. The project has ethical approval from the Regional Committees for Medical and Health Research Ethics in Norway (REK-Nord number: 2014/2182). The dataset was anonymized to ensure that the synthetic version could not reasonably be identical to any real-life individuals. The anonymization process was done as follows: first, only relevant information was kept from the original data set. Second, individuals' birth year and gender were replaced with randomly generated values within a plausible range of values. And last, all dates were replaced with randomly generated dates. This dataset was sufficiently scrambled to generate a synthetic dataset and was only used for the current study. The dataset has details related to Patient, Prescriber, Hospitalization, Diagnosis, Location, Medications, Prescriptions, and Prescriptions dispatched. A publication using this data to create a machine learning model for predicting hospitalization risk is under review.

  6. p

    CARMEN-I: A resource of anonymized electronic health records in Spanish and...

    • physionet.org
    Updated Apr 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eulalia Farre Maduell; Salvador Lima-Lopez; Santiago Andres Frid; Artur Conesa; Elisa Asensio; Antonio Lopez-Rueda; Helena Arino; Elena Calvo; Maria Jesús Bertran; Maria Angeles Marcos; Montserrat Nofre Maiz; Laura Tañá Velasco; Antonia Marti; Ricardo Farreres; Xavier Pastor; Xavier Borrat Frigola; Martin Krallinger (2024). CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools [Dataset]. http://doi.org/10.13026/x7ed-9r91
    Explore at:
    Dataset updated
    Apr 20, 2024
    Authors
    Eulalia Farre Maduell; Salvador Lima-Lopez; Santiago Andres Frid; Artur Conesa; Elisa Asensio; Antonio Lopez-Rueda; Helena Arino; Elena Calvo; Maria Jesús Bertran; Maria Angeles Marcos; Montserrat Nofre Maiz; Laura Tañá Velasco; Antonia Marti; Ricardo Farreres; Xavier Pastor; Xavier Borrat Frigola; Martin Krallinger
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    The CARMEN-I corpus comprises 2,000 clinical records, encompassing discharge letters, referrals, and radiology reports from Hospital Clínic of Barcelona between March 2020 and March 2022. These reports, primarily in Spanish with some Catalan sections, cover COVID-19 patients with diverse comorbidities like kidney failure, cardiovascular diseases, malignancies, and immunosuppression. The corpus underwent thorough anonymization, validation, and expert annotation, replacing sensitive data with synthetic equivalents. A subset of the corpus features annotations of medical concepts by specialists, encompassing symptoms, diseases, procedures, medications, species, and humans (including family members). CARMEN-I serves as a valuable resource for training and assessing clinical NLP techniques and language models, aiding tasks like de-identification, concept detection, linguistic modifier extraction, document classification, and more. It also facilitates training researchers in clinical NLP and is a collaborative effort involving Barcelona Supercomputing Center's NLP4BIA team, Hospital Clínic, and Universitat de Barcelona's CLiC group.

  7. pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

    Dataset specification:

    • MRI images of Vertebral Units labelled based on region
    • Dataset is comprised of 10000 pairs of images and labels
    • Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
    • Images are 3D of size (9, 64, 64)
    • Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

    Arxiv paper: https://arxiv.org/abs/2106.13199
    Github code: https://github.com/tcoroller/pGAN/

    Abstract:

    Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

  8. H

    Healthcare NLP Solution Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Healthcare NLP Solution Report [Dataset]. https://www.datainsightsmarket.com/reports/healthcare-nlp-solution-1431685
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Healthcare Natural Language Processing (NLP) solutions market, valued at $871 million in 2025, is projected to experience steady growth with a Compound Annual Growth Rate (CAGR) of 2.5% from 2025 to 2033. This growth is fueled by several key drivers. The increasing volume of unstructured healthcare data, including medical records, research papers, and clinical trial reports, necessitates efficient and accurate analysis. NLP solutions offer a powerful means to extract meaningful insights from this data, improving diagnostic accuracy, accelerating drug discovery, and personalizing patient care. Furthermore, the rising adoption of cloud-based solutions and the increasing demand for improved operational efficiency within healthcare organizations are contributing to market expansion. While data privacy and security concerns represent a significant restraint, advancements in data anonymization techniques and robust security protocols are mitigating these risks. The market is segmented by application (drug discovery, clinical trials, etc.) and by type of NLP solution (statistical NLP, hybrid NLP). The strong presence of major technology companies like IBM, Microsoft, Google, and AWS, alongside specialized healthcare IT firms, ensures a competitive landscape fostering innovation and accessibility of these solutions. The North American market currently holds a significant share, driven by advanced healthcare infrastructure and high adoption rates. However, growth is expected in other regions, particularly Asia Pacific, spurred by increasing investments in healthcare IT and expanding digital health initiatives. The forecast period (2025-2033) suggests continued, albeit moderate, growth. This is partially attributed to the inherent complexities and regulatory hurdles associated with implementing NLP solutions in sensitive healthcare settings. However, the long-term potential remains significant, driven by ongoing technological advancements, the continued growth of electronic health records (EHRs), and the increasing focus on data-driven decision-making in healthcare. The segment focusing on drug discovery is expected to exhibit robust growth due to the potential of NLP to accelerate the identification and development of new therapies. Similarly, the use of hybrid NLP approaches, combining statistical and rule-based methods, is likely to gain traction due to their ability to handle the nuanced nature of clinical language.

  9. f

    Trial Characteristics.

    • plos.figshare.com
    xls
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Kendall; Julian S. Rechberger; Abdelrahman M. Hamouda; Mark Cwajna; Sherief Ghozy; Kogulavadanan Arumaithurai; David F. Kallmes (2025). Trial Characteristics. [Dataset]. http://doi.org/10.1371/journal.pone.0323109.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 16, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Nicholas Kendall; Julian S. Rechberger; Abdelrahman M. Hamouda; Mark Cwajna; Sherief Ghozy; Kogulavadanan Arumaithurai; David F. Kallmes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionAccurate and timely reporting of scientific knowledge is crucial to clinical research ethics. ClinicalTrials.gov allows researchers to register trials and report results to the public and scientific community. Despite FDA reporting mandates, compliance with the required 12-month window remains low. Given glioblastoma’s (GBM) aggressive nature, timely reporting is especially important for advancing research and benefiting patients. This study aimed to assess GBM trial reporting rates on ClinicalTrials.gov and identify factors related to non-compliance.MethodsWe utilized a previously published algorithm to identify studies on ClinicalTrials.gov likely mandated to report. We obtained the titles, status, results, phases, funding type, intervention type, study design and type, location, and all available trial dates. Kaplan-Meier analysis evaluated reporting times, and Cox regression models identified factors associated with reporting within five years.ResultsWe identified 255 GBM-related trials likely mandated to report. 13% reported results within the 12-month deadline, while 82.7% reported within five years. Factors significantly associated with lower reporting rates at five years were biological interventions (HR 0.61, 95% CI: 0.37–1.00, p = 0.049), Phase 1–2 trials (HR 0.65, 95% CI: 0.46–0.91, p = 0.014), and studies with quadruple masking (HR 0.19, 95% CI: 0.04–0.93, p = 0.040).ConclusionFor GBM-related trials, noncompliance with reporting mandates remains a major issue. Reporting within 12 months was only 13%. No factors influenced reporting by 12 months, but multiple factors influenced five-year reporting. Further research is needed to understand these associations and create targeted incentives to increase transparency through timely reporting of GBM-related trials.

  10. f

    Pediatric COVID-19 Dataset

    • figshare.com
    application/csv
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuin-Hen Liew; David Chun-Ern Ng (2024). Pediatric COVID-19 Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25209818.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    figshare
    Authors
    Chuin-Hen Liew; David Chun-Ern Ng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data extracted from the pediatric infectious disease case registration system of the Negeri Sembilan state of Malaysia. These were secondary data that underwent data cleaning and preprocessing (anonymization, imputation of missing values, categorical variable encoding, and dimension reduction) for clinical research.a. dataset_pediatricCOVID19_cleanedData_1495rows.csv consists of clinical data collected between 1st February 2020 and 31st December 2021.b. dataset2_pediatricCOVID19_cleanedData_500rows.csv consists of clinical data collected between 1st January 2022 and 31st March 2022.Outcome variable: 1= requires ambulatory outpatient care, 2= requires hospital care

  11. Likelihood That Results of Clinical Trials Were Reported by 5 Years after...

    • plos.figshare.com
    xls
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Kendall; Julian S. Rechberger; Abdelrahman M. Hamouda; Mark Cwajna; Sherief Ghozy; Kogulavadanan Arumaithurai; David F. Kallmes (2025). Likelihood That Results of Clinical Trials Were Reported by 5 Years after the Primary Completion Date, According to Trial Characteristics. [Dataset]. http://doi.org/10.1371/journal.pone.0323109.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 16, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicholas Kendall; Julian S. Rechberger; Abdelrahman M. Hamouda; Mark Cwajna; Sherief Ghozy; Kogulavadanan Arumaithurai; David F. Kallmes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Likelihood That Results of Clinical Trials Were Reported by 5 Years after the Primary Completion Date, According to Trial Characteristics.

  12. Healthcare Synthetic-Data Governance Services Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Healthcare Synthetic-Data Governance Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/healthcare-synthetic-data-governance-services-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Healthcare Synthetic-Data Governance Services Market Outlook




    As per our latest research, the global healthcare synthetic-data governance services market size reached USD 1.14 billion in 2024, demonstrating a robust momentum in the adoption of synthetic data solutions across the healthcare sector. The industry is expanding at a CAGR of 29.3% and is forecasted to attain a value of USD 8.71 billion by 2033. This exceptional growth is primarily driven by the increasing demand for privacy-preserving data solutions, escalating regulatory pressures, and the need for high-quality data to fuel advanced healthcare analytics and artificial intelligence (AI) applications.




    The healthcare synthetic-data governance services market is experiencing exponential growth due to the growing emphasis on data privacy and security in healthcare environments. As healthcare organizations increasingly integrate digital technologies and electronic health records (EHRs), there is a concurrent rise in concerns around patient data confidentiality and compliance with global data protection regulations such as HIPAA, GDPR, and others. Synthetic data, which mimics real patient data without exposing sensitive information, is becoming a preferred solution for training AI models, conducting clinical research, and enabling data sharing across organizations. The market is further propelled by the rising adoption of AI and machine learning in healthcare, which necessitates vast, high-quality datasets that can be safely used without breaching patient privacy. This has led to a surge in demand for robust governance frameworks and services that ensure the ethical and compliant use of synthetic data throughout its lifecycle.




    Another significant growth factor is the increasing complexity and volume of healthcare data, which is making traditional data anonymization techniques less effective. As healthcare providers, pharmaceutical companies, and research institutes seek to leverage big data analytics and advanced modeling, they are turning to synthetic data to overcome data scarcity and bias issues. Synthetic-data governance services play a crucial role in standardizing processes, ensuring data quality, and maintaining regulatory compliance while facilitating seamless data sharing and collaboration. The market is also witnessing an upsurge in partnerships between healthcare organizations and technology vendors, aiming to co-develop tailored governance solutions that address specific clinical, operational, and research needs. This collaborative ecosystem is fostering innovation and accelerating the deployment of synthetic-data governance frameworks globally.




    Furthermore, the healthcare synthetic-data governance services market is benefiting from increased investments by both public and private sectors in digital health infrastructure. Governments and regulatory bodies are actively supporting initiatives that promote data-driven healthcare innovation while safeguarding patient rights. The proliferation of cloud computing and the emergence of interoperable health information systems are making it easier for organizations to implement synthetic-data governance solutions at scale. Additionally, the COVID-19 pandemic has highlighted the critical need for secure, accessible, and compliant data management practices, further intensifying demand for synthetic-data governance services. These factors collectively position the market for sustained long-term growth.




    Regionally, North America continues to dominate the healthcare synthetic-data governance services market, owing to its advanced healthcare IT ecosystem, strong regulatory frameworks, and high adoption of AI-driven healthcare solutions. Europe follows closely, with stringent data privacy laws and a growing emphasis on cross-border healthcare data sharing. The Asia Pacific region is emerging as a high-growth market, driven by rapid digitalization of healthcare systems, government initiatives to promote health IT, and increasing investments in research and development. Latin America and the Middle East & Africa are gradually catching up, supported by improving healthcare infrastructure and rising awareness about the benefits of synthetic data in healthcare. Overall, the market is characterized by dynamic regional trends, with each region presenting unique opportunities and challenges for stakeholders.



  13. H

    Data from: Dataset of anonymized discharge summaries of sepsis patients from...

    • dataverse.harvard.edu
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rildo Pinto da Silva; Antonio Pazin-Filho (2025). Dataset of anonymized discharge summaries of sepsis patients from a Brazilian tertiary hospital for NLP applications [Dataset]. http://doi.org/10.7910/DVN/GWNBQQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Rildo Pinto da Silva; Antonio Pazin-Filho
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Background: Publicly available clinical text datasets in Brazilian Portuguese for Natural Language Processing (NLP) research and education are scarce, largely due to challenges in ensuring robust anonymization of sensitive patient data, especially within long clinical notes. Objective: To address this gap, we created and describe a new dataset of anonymized discharge summaries from sepsis patients treated at a Brazilian tertiary teaching hospital. Methods: Discharge summaries for adult sepsis patients (identified via ICD-10 codes) were extracted from the hospital's Electronic Health Record (EHR) system. Following manual physician review to ensure text quality and relevance (N=387), the summaries underwent processing including cleaning, abbreviation expansion using a custom dictionary, and a two-stage automated anonymization process (unsupervised GLiNER followed by a supervised custom spaCy NER model). A final manual review ensured confidentiality and excluded summaries unsuitable for NLP educational tasks. Key structured clinical variables (length of stay, ICU admission, palliative care status, number of specialties, outcome) were also extracted and linked to each summary. Results: The resulting dataset comprises 200 anonymized discharge summaries in Brazilian Portuguese, presented in tabular format (.xlsx file) alongside the linked structured clinical variables, relevant ICD-10 codes, and the abbreviation dictionary. An accompanying Jupyter Notebook details the processing steps. Conclusion: This dataset provides a valuable and accessible resource of real-world, anonymized Brazilian Portuguese clinical text, suitable for educational purposes and research in NLP. It facilitates training and experimentation with tasks such as text preprocessing, named entity recognition, classification, and topic modeling, and enables the exploration of integrating textual data with structured clinical variables.

  14. Generative AI In Healthcare Market Analysis, Size, and Forecast 2025-2029:...

    • technavio.com
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2021 - 2025
    Area covered
    Canada, United Kingdom, United States, Global
    Description

    Snapshot img

    Generative AI In Healthcare Market Size 2025-2029

    The generative AI in healthcare market size is forecast to increase by USD 9.38 billion, at a CAGR of 38.7% between 2024 and 2029.

    The Generative AI market in healthcare is experiencing significant growth, driven by the pressing need to enhance operational efficiency and alleviate clinician burnout. This demand is fueled by the increasing recognition of AI's potential to streamline processes, reduce workload, and improve patient outcomes. A key trend in this market is the ascendance of multimodal AI models, which can analyze various data types and provide more accurate and comprehensive insights. However, the regulatory landscape presents substantial challenges. As AI systems become more sophisticated, ensuring their safety, efficacy, and transparency becomes increasingly complex. Drug repurposing is another area of focus, with AI-driven therapeutics offering new possibilities for treating diseases. Regulators are demanding clear explanations of how AI systems arrive at their decisions, making it essential for companies to invest in explainable AI technologies. Navigating these challenges will require a strategic approach, including robust regulatory compliance frameworks, transparent reporting, and ongoing research and development to improve AI explainability. Companies that can effectively address these challenges will be well-positioned to capitalize on the significant opportunities in the generative AI market in healthcare.

    What will be the Size of the Generative AI In Healthcare Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    The market for generative AI in healthcare continues to evolve, with new applications emerging across various sectors. Predictive analytics is increasingly being used to identify patient risk factors and improve treatment outcomes, while AI bias mitigation ensures fairness and accuracy in medical decision-making. Data security protocols remain a priority, with medical device regulation becoming more stringent to address potential cybersecurity threats. Adverse event detection is a critical application, with AI models able to analyze vast amounts of data to identify patterns and potential risks.

    Personalized cancer treatment is also advancing, with explainable AI models enabling doctors to understand the underlying causes of cancer and tailor treatments accordingly. AI in radiology is transforming diagnostic accuracy, with medical image processing becoming more precise and efficient. Real-world evidence is also gaining importance, with federated learning enabling the analysis of data from multiple sources without compromising patient privacy. Generative AI models are being used to create realistic medical simulations for training purposes, improving healthcare workflow automation, and patient engagement. According to a recent industry report, the market is expected to grow by over 30% in the next five years, driven by the increasing demand for improved patient outcomes and cost reduction.

    For instance, a study found that AI-powered diagnostic tools led to a 25% reduction in diagnostic errors, resulting in significant cost savings for healthcare providers. Regulatory compliance, disease pathway analysis, clinical trial design, and NLP for healthcare are other areas where generative AI is making a significant impact.

    How is this Generative AI In Healthcare Market segmented?

    The generative AI in healthcare market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Component

    On-premises cloud

    Application

    Drug discovery and development Medical imaging and diagnostics Personalized medicine Virtual health assistants Others

    End-user

    Hospitals and clinics Pharmaceuticals companies Research institutes Health insurance companies

    Geography

    North America

    US Canada

    Europe

    France Germany UK

    APAC

    China India Japan South Korea

    South America

    Brazil

    Rest of World (ROW)

    By Component Insights

    The On-premises segment is estimated to witness significant growth during the forecast period. Generative AI is revolutionizing healthcare by enhancing precision in oncology through large language models and machine learning algorithms. Electronic health records are being leveraged to power AI-driven diagnostics, while patient data privacy is ensured through healthcare data anonymization. Health outcome prediction and clinical decision support are improved with the help of medical image segmentation and remote patient monitoring. Synthetic data generation and medical text summarization streamline research processes, enabling advancements in radiation therapy planni

  15. A

    AI Medical Image Analytics Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). AI Medical Image Analytics Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-medical-image-analytics-1986104
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI medical image analytics market is experiencing rapid growth, driven by the increasing volume of medical images generated globally, advancements in artificial intelligence and machine learning algorithms, and the rising demand for accurate and efficient diagnostic tools. The market's expansion is fueled by the ability of AI to automate image analysis, improve diagnostic accuracy, reduce human error, and accelerate the diagnostic process, ultimately leading to better patient outcomes and reduced healthcare costs. While precise figures for market size and CAGR are not provided, considering the rapid technological advancements and widespread adoption across various medical specialties, a conservative estimate would place the 2025 market size at approximately $2 billion, with a CAGR of 20-25% projected through 2033. This growth is further propelled by the increasing availability of large, annotated datasets for training AI algorithms and the growing collaboration between technology companies and healthcare providers. Key restraints include regulatory hurdles surrounding AI-based medical devices, concerns about data privacy and security, and the need for robust validation and clinical trials to ensure the reliability and safety of AI-powered diagnostic tools. However, these challenges are being actively addressed through regulatory frameworks, improved data anonymization techniques, and rigorous testing protocols. Market segmentation reveals strong performance across various applications, including radiology, pathology, oncology, and ophthalmology, with leading companies like Lunit Inc, Infervision, and NVIDIA contributing significantly to innovation and market share. The geographic distribution is expected to be heavily influenced by factors such as healthcare infrastructure, regulatory environments, and technological adoption rates. North America and Europe are anticipated to hold a significant share, while the Asia-Pacific region is poised for substantial growth in the coming years.

  16. Data from "Auditory tests for characterizing hearing deficits in listeners...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin, pdf, zip
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data from "Auditory tests for characterizing hearing deficits in listeners with various hearing abilities: The BEAR test battery" [Dataset]. https://zenodo.org/records/4923009
    Explore at:
    bin, pdf, zipAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Raul Sanchez-Lopez; Raul Sanchez-Lopez; Michal Fereczkowski; Michal Fereczkowski; Mouhamad El-Haj-Ali; Mouhamad El-Haj-Ali; Federica Bianchi; Federica Bianchi; Oscar Cañete; Oscar Cañete; Mengfan Wu; Mengfan Wu; Tobias Neher; Tobias Neher; Torsten Dau; Torsten Dau; Sébastien Santurette; Sébastien Santurette
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains raw and processed data used and described in:

    R. Sanchez-Lopez, S.G. Nielsen, M. El-Haj-Ali, F. Bianchi, M, Fereckzowski, O. Cañete, M. Wu, T. Neher, T. Dau and S. Santurette (under review). ``Auditory tests for characterizing hearing deficits in listeners with various hearing abilities: The BEAR test battery,''. submitted to Frontiers in Neuroscience

    [Preprint available in medRxiv:
    https://doi.org/10.1101/2020.02.17.20021949]

    One aim of the Better hEAring Rehabilitation (BEAR) project is to define a new clinical profiling tool, a test-battery, for individualized hearing loss characterization. Whereas the loss of sensitivity can be efficiently assessed by pure-tone audiometry, it still remains a challenge to address supra-threshold hearing deficits using appropriate clinical diagnostic tools. In contrast to the classical attenuation-distortion model (Plomp, 1986), the proposed BEAR approach is based on the hypothesis that any listener’s hearing can be characterized along two dimensions reflecting largely independent types of perceptual distortions. Recently, a data-driven approach (Sanchez-Lopez et al., 2018) provided evidence consistent with the existence of two independent sources of distortion, and thus different auditory profiles. Eleven tests were selected for the clinical test battery, based on their feasibility, time efficiency and related evidence from the literature. The proposed tests were divided into five categories: audibility, speech perception, binaural-processing abilities, loudness perception, and spectro-temporal resolution. Seventy-five listeners with symmetric, mild-to-severe sensorineural hearing loss were selected from a clinical population of hearing-aid users. The participants completed all tests in a clinical environment and did not receive systematic training for any of the tasks. The analysis of the results focused on the ability of each test to pinpoint individual differences among the participants, relationships among the different tests, and determining their potential use in clinical settings. The results might be valuable for hearing-aid fitting and clinical auditory profiling.

    Please cite this article when using the data

    The Dataset BEAR3 has also been used in:

    Sanchez-Lopez R, Fereczkowski M, Neher T, Santurette S, Dau T. Robust Data-Driven Auditory Profiling Towards Precision Audiology. Trends in Hearing. January 2020. doi:10.1177/2331216520973539

    Sanchez-Lopez, R., Fereczkowski, M., Neher, T., Santurette, S., & Dau, T. (2020). Robust auditory profiling: Improved data-driven method and profile definitions for better hearing rehabilitation. Proceedings of the International Symposium on Auditory and Audiological Research, 7, 281-288. Retrieved from https://proceedings.isaar.eu/index.php/isaarproc/article/view/2019-32

    and

    Sanchez Lopez, R., Nielsen, S. G., Cañete, O., Fereczkowski, M., Wu, M., Neher, T., Dau, T., & Santurette, S. (2019). A clinical test battery for Better hEAring Rehabilitation (BEAR): Towards the prediction of individual auditory deficits and hearing-aid benefit. In Proceedings of the 23rd International Congress on Acoustics (pp. 3841-3848). Deutsche Gesellschaft für Akustik e.V.. https://doi.org/10.18154/RWTH-CONV-239177

    Description of the files:

    • BEAR2.xlsx: Anonymized raw data obtained using the BEAR test battery.
    • BEAR2_YNH.xlsx: Additional anonymized raw data obtained using the BEAR test battery with young normal-hearing listeners.
    • BEAR3.xlsx: Anonymized processed data for statistical data analysis.
    • BEAR3_Results_AProfiling.xlsx: BEAR3 dataset including the profiles, probabilities to belong to each of the four profiles and estimated degree of Distortion type-I and Distortion type-II.
    • BEAR_Reliability.xlsx: Anonymized raw data similar to BEAR2 for the reliability study.
    • DataParticipants.xlsx: Anonymized basic data associated with the participants: Gender, Age, PTA, etc.
    • TestBatteryMethods_v1.1.pdf: Documentation of the test methods. Protocol included and corrections.
    • Reliability_v1.0.pdf: Detailed explanation about the test-retest reliability study carried out with a subset of the participants.

    * The participant IDs in each of the files has been assigned randomly to ensure the anonymization of the data. The pseudo-anonymized data might be shared under request by direct correspondence with the authors.

  17. u

    Identifying Clinical Skill Gaps of Healthcare Workers Using a Decision...

    • data.unisante.ch
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haykel Karoui (2025). Identifying Clinical Skill Gaps of Healthcare Workers Using a Decision Support Algorithm in Rwanda - Rwanda [Dataset]. https://data.unisante.ch/catalog/58
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    Haykel Karoui
    Time period covered
    2021 - 2023
    Area covered
    Rwanda
    Description

    Abstract

    Digital clinical decision support algorithms (CDSAs) that guide healthcare workers during consultations can enhance adherence to guidelines and the resulting quality of care. However, this improvement depends on the accuracy of inputs (symptoms and signs) entered by healthcare workers into the digital tool, which relies mainly on their clinical skills, that are often limited, especially in resource-constrained primary care settings. This study aimed to identify and characterize potential clinical skill gaps based on CDSA data patterns and clinical observations. We retrospectively analyzed data from 20,085 pediatric consultations conducted using an IMCI-based CDSA in 16 primary health centers in Rwanda. We focused on clinical signs with numerical values: temperature, mid-upper arm circumference (MUAC), weight, height, z-scores (MUAC for age, weight for age, and weight for height), heart rate, respiratory rate and blood oxygen saturation. Statistical summary measures (frequency of skipped measurements, frequent plausible and implausible values) and their variation in individual health centers compared to the overall average were used to identify 10 health centers with irregular data patterns signaling potential clinical skill gaps. We subsequently observed 188 consultations in these health centers and interviewed healthcare workers to understand potential error causes. Observations indicated basic measurements not being assessed correctly in most children; weight (70%), MUAC (69%), temperature (67%), height (54%). These measures were predominantly conducted by minimally trained non-clinical staff in the registration area. More complex measures, done mostly by healthcare workers in the consultation room, were often skipped: respiratory rate (43%), heart rate (37%), blood oxygen saturation (33%). This was linked to underestimating the importance of these signs in child management, especially in the context of high patient loads typical at primary care level. Addressing clinical skill gaps through in-person training, eLearning and regular personalized mentoring tailored to specific health center needs is imperative to improve quality of care and enhance the benefits of CDSAs.

    Geographic coverage

    16 primary healthcare centers (HCs) of Rusizi and Nyamasheke districts in Rwanda.

    Analysis unit

    First dataset was collected directly by the ePOCT+ CDSA during 20,085 pediatric consultations across 16 primary health centers in Rwanda. It includes anonymized patient, healthfacility and consultation data with key clinical measurements (temperature, mid-upper arm circumference (MUAC), weight, height, MUAC for age z-score, weight for age z-score, weight for height z-score, heart rate, respiratory rate and blood oxygen saturation (SpO2).) Second dataset results from structured observations of 188 routine pediatric consultations at a subset of 10 health facilities. Clinicians used a standardized evaluation form to record clinical measurements, mirroring variables in the first dataset. This dataset is used to deepen the analysis from the primary dataset by understanding the reason for the patterns appearing from the quantitative analysis of the first dataset.

    Universe

    Children aged 1 day to 14 years with an acute condition, in the 16 HCs where the intervention was deployed.

    Kind of data

    Clinical data [cli]

    Sampling procedure

    First dataset: ePOCT+ stores all the information (date of consultation, anthropometric measures, vitals, presence/absence of specific symptoms and signs prompted by the algorithm, diagnoses, medicines, managements, etc.) entered by the HW in the tablet during consultations. We retrospectively analyzed data from 20,085 outpatient consultations conducted between November 2021 and October 2022 with children aged 1 day to 14 years with an acute condition, in the 16 HCs where the intervention was deployed. Data cleaning, management, and analyses were conducted using R software (version 4.2.1). Second dataset: Based on the results of the retrospective analysis, we observed 188 routine consultations in a subset of 10 of 16 HCs (approximately 19 observations per HC), from 20 December 2022 and to 09 March 2023. The selection of HCs was guided by the retrospective analysis, ensuring that the 10 HCs chosen were those showing the most critical results. The observing study clinician obtained oral consent from the HWs and was instructed not to interfere with the consultation to avoid introducing any additional bias to the observer effect. To ensure a standardized and consistent evaluation, a digital evaluation form (Google sheets) was used. These observations were conducted over 3 days per HC, with efforts made to separate them by a few days in order to have more chance to observe several different HWs and minimize potential bias. At the end of each day of observation in a HC (and not after each consultation to avoid any influence on subsequent consultations), the observing study clinician conducted an interview with the HW to understand why the assessment of some signs was skipped.Data were exported to Microsoft Excel (Version 16.77.1) for further simple descriptive analysis.

    Sampling deviation

    Second dataset: Most of the time, there was only one HW attending to children in the HC on a given day. On the rare occasions when two HW were present, each was observed by one of the two study clinicians.

    Mode of data collection

    Other [oth]

    Research instrument

    The second dataset for this study was derived from structured observations of 188 routine pediatric consultations conducted across a subset of 10 health facilities. Clinicians utilized a standardized evaluation form that included variables aligning with those in the first dataset. This secondary dataset was designed to provide deeper insights into patterns observed in the primary dataset through the quantitative analysis.

      The data collection focused on various clinical measurements and observations, categorized as follows: 
      General Information: 
      • Date of the consultation. 
      • Health facility (coded for anonymity). 
      • Clinical measurements taken at the reception and during the consultation. 
      • Presence of a conducting line. Additional remarks related to the consultation. 
    
      Clinical Measurements: For each of the following, the dataset records whether the measurement was assessed or skipped, the quality of assessment (sufficient/insufficient), reasons for skipping or insufficient assessments, and any extra remarks: 
      • Temperature (T°). 
      • MUAC (Mid-Upper Arm Circumference). 
      • Weight. Height. 
      • Respiratory Rate (RR). 
      • Blood Oxygen Saturation (Sat). 
      • Heart Rate (HR). 
    
      Additional Observations: Remarks on other signs and symptoms assessed during the consultation. The structured nature of this dataset ensures consistency in evaluating the reasons behind clinical decisions and the quality of care provided in routine pediatric consultations.
    

    Cleaning operations

    Data editing was conducted as follows: First data set: • Data Extraction: The dataset was extracted from the larger ePOCT+ storage system, which records all consultation-related information entered by healthcare workers (HWs) in tablets during consultations. This includes details such as the date of consultation, anthropometric measures, vital signs, the presence or absence of specific symptoms and signs prompted by the algorithm, diagnoses, medicines, and managements.

      • Data Cleaning: 
      The extracted data were systematically cleaned to focus solely on the variables of interest for this analysis. Irrelevant variables and incomplete records were excluded to ensure a streamlined and accurate dataset. 
    
      • Anonymization: 
      To protect patient and health facilities confidentiality, the data were anonymized prior to analysis. All personal identifiers were removed, and only aggregated or coded information was retained. 
    
      • Analysis Preparation: 
      After cleaning and anonymization, the dataset was reviewed for consistency and coherence. Specific patterns of data were analyzed for the selected variables of interest, ensuring alignment with the study objectives. 
    
      • Software Used: Data cleaning, management, and analyses were conducted using R software (version 4.2.1). All processes, including extraction, cleaning, and anonymization, were documented to maintain transparency and reproducibility. 
    
      **Second dataset:** 
      • Data Collection: Data were collected directly from respondents through a Google Forms questionnaire. The structured format ensured standardized responses across all participants, facilitating subsequent data processing and analysis. 
    
      • Data Export: 
      Upon completion of data collection, the dataset was exported from Google Forms to Microsoft Excel (Version 16.77.1). This provided a structured and organized format for further data handling. 
    
      • Anonymization: 
      All personally identifiable information was removed during the data processing phase to protect participant confidentiality. Anonymization measures included replacing personal identifiers with unique codes and omitting any information that could reveal the identity of respondents. 
    
      • Data Cleaning and Descriptive Analysis: 
      The dataset was reviewed in Microsoft Excel to ensure consistency and completeness. Responses were screened for missing or inconsistent data, and necessary corrections were made where appropriate. Simple descriptive analyses were conducted within Excel to summarize key variables and identify initial patterns in the data.
    
  18. n

    CMT1A-BioStampNPoint2023: Charcot-Marie-Tooth disease type 1A accelerometry...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthik Dinesh; Nicole White; Lindsay Baker; Janet Sowden; Steffen Behrens-Spraggins; Elizabeth P Wood; Julie L Charles; David Herrmann; Gaurav Sharma; Katy Eichinger (2023). CMT1A-BioStampNPoint2023: Charcot-Marie-Tooth disease type 1A accelerometry dataset from three wearable sensor study [Dataset]. http://doi.org/10.5061/dryad.p5hqbzktr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    University of Rochester Medical Center
    University of Rochester
    Authors
    Karthik Dinesh; Nicole White; Lindsay Baker; Janet Sowden; Steffen Behrens-Spraggins; Elizabeth P Wood; Julie L Charles; David Herrmann; Gaurav Sharma; Katy Eichinger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The CMT1A-BioStampNPoint2023 dataset provides data from a wearable sensor accelerometry study conducted for studying gait, balance, and activity in 15 individuals with Charcot-Marie-Tooth disease Type 1A (CMT1A). In addition to individuals with CMT1A, the dataset also includes data for 15 controls that also went through the same in-clinic study protocol as the CMT1A participants with a substantial fraction (9) of the controls also participating in the in-home study protocol. For the CMT1A participants, data is provided for 15 participants for the baseline visit and associated home recording duration and, additionally, for a subset of 12 of these participants data is also provided for a 12-month longitudinal visit and associated home recording duration. For controls, no longitudinal data is provided as none was recorded. The data were acquired using lightweight MC 10 BioStamp NPoint sensors (MC 10 Inc, Lexington, MA), three of which were attached to each participant for gathering data over a roughly one day interval. For additional details, see the description in the "README.md" included with the dataset. Methods The dataset contains data from wearable sensors and clinical data. The wearable sensor data was acquired using wearable sensors and the clinical data was extracted from the clinical record. The sensor data has not been processed per-se but the start of the recording time has been anonymized to comply with HIPPA requirements. Both the sensor data and the clinical data passed through a Python program for the aforementioned time anonymization and for standard formatting. Additional details of the time anonymization are provided in the file "README.md" included with the dataset.

  19. Generator Market In The Healthcare Industry Analysis APAC, Europe, North...

    • technavio.com
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2022). Generator Market In The Healthcare Industry Analysis APAC, Europe, North America, Middle East and Africa, South America - China, US, Germany, India, UK - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/generator-market-in-the-healthcare-sector-industry-analysis
    Explore at:
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    Generator Market In The Healthcare Industry Size 2024-2028

    The generator market in the healthcare industry size is forecast to increase by USD 1.11 billion, at a CAGR of 3.2% between 2023 and 2028.

    The market is driven by the unreliable power grid infrastructure in developing countries, necessitating the use of backup power solutions. This trend is particularly prevalent in regions with limited access to stable electricity, where healthcare facilities require uninterrupted power supply for critical operations. Technological advances in generator technology offer opportunities for market growth, with innovations such as fuel efficiency, remote monitoring, and automation enhancing the reliability and efficiency of power generation. However, the market faces challenges in the form of stringent emission regulations. Compliance with these regulations adds to the cost of generator production and maintenance, potentially limiting profitability for market players.
    Navigating these regulatory requirements while maintaining affordability and reliability will be a key challenge for companies seeking to capitalize on market opportunities in the healthcare industry. Additionally, the increasing demand for renewable energy sources may impact the demand for traditional generators, necessitating continuous innovation and adaptation to remain competitive.
    

    What will be the Size of the Generator Market In The Healthcare Industry during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in technology and the increasing demand for personalized and efficient healthcare solutions. Entities such as synthetic patient data, drug efficacy modeling, radiation therapy planning, virtual clinical trials, clinical workflow automation, remote patient monitoring, precision oncology AI, radiology AI assistance, and others, are seamlessly integrated into the healthcare ecosystem. These tools enable the generation of genomic data, treatment response prediction, medical image creation, and the optimization of clinical trials. The ongoing unfolding of market activities reveals the application of AI-powered diagnostics, telehealth platform development, drug discovery platforms, and medical device simulation, among others.

    Biomarker identification, prognostic model development, health record generation, and healthcare data anonymization are also crucial components of this dynamic landscape. The continuous integration of these technologies is transforming the healthcare industry, enabling more accurate patient outcome predictions, personalized medicine, and improved patient care.

    How is this Generator In The Healthcare Industry Industry segmented?

    The generator in the healthcare industry industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Application
    
      Hospitals
      Clinics
    
    
    Type
    
      Stationary
      Portable
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        India
    
    
      Rest of World (ROW)
    

    By Application Insights

    The hospitals segment is estimated to witness significant growth during the forecast period.

    In the healthcare industry, the demand for generators is escalating due to the increasing adoption of advanced technologies such as genomic sequencing, treatment response prediction, and ai-powered diagnostics. The generation of genomic data and medical images necessitates the use of sophisticated equipment, which requires a reliable power supply. Hospitals, in particular, are leading the market due to the high demand for uninterrupted power in diagnostic centers and operation rooms. Telehealth platforms, drug discovery platforms, and clinical trial optimization also contribute to the market's growth by requiring power-intensive infrastructure for remote patient monitoring, virtual clinical trials, and precision oncology ai.

    Furthermore, the development of medical chatbots, electronic health records, and surgical simulation software necessitates the use of generators for powering these applications. The integration of ai-driven drug design, medical device simulation, biomarker identification, prognostic model development, and disease modeling software also increases the demand for generators in the healthcare sector. The market is expected to continue growing due to the increasing focus on healthcare data anonymization, patient outcome prediction, drug efficacy modeling, radiation therapy planning, and clinical workflow automation. The integration of 3d organ printing, synthetic patient data, and drug interaction prediction further expands the market's scope

  20. Cebulka (Polish dark web cryptomarket and image board) messages data

    • zenodo.org
    • data.niaid.nih.gov
    csv, zip
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piotr Siuda; Piotr Siuda; Haitao Shi; Haitao Shi; Patrycja Cheba; Patrycja Cheba; Leszek Świeca; Leszek Świeca (2024). Cebulka (Polish dark web cryptomarket and image board) messages data [Dataset]. http://doi.org/10.5281/zenodo.10810939
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Piotr Siuda; Piotr Siuda; Haitao Shi; Haitao Shi; Patrycja Cheba; Patrycja Cheba; Leszek Świeca; Leszek Świeca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 2023
    Description

    General Information

    1. Title of Dataset

    Cebulka (Polish dark web cryptomarket and image board) messages data.

    2. Data Collectors

    Haitao Shi (The University of Edinburgh, UK); Patrycja Cheba (Jagiellonian University); Leszek Świeca (Kazimierz Wielki University in Bydgoszcz, Poland).

    3. Funding Information

    The dataset is part of the research supported by the Polish National Science Centre (Narodowe Centrum Nauki) grant 2021/43/B/HS6/00710.

    Project title: “Rhizomatic networks, circulation of meanings and contents, and offline contexts of online drug trade” (2022-2025; PLN 956 620; funding institution: Polish National Science Centre [NCN], call: OPUS 22; Principal Investigator: Piotr Siuda [Kazimierz Wielki University in Bydgoszcz, Poland]).

    Data Collection Context

    4. Data Source

    Polish dark web cryptomarket and image board called Cebulka (http://cebulka7uxchnbpvmqapg5pfos4ngaxglsktzvha7a5rigndghvadeyd.onion/index.php).

    5. Purpose

    This dataset was developed within the abovementioned project. The project focuses on studying internet behavior concerning disruptive actions, particularly emphasizing the online narcotics market in Poland. The research seeks to (1) investigate how the open internet, including social media, is used in the drug trade; (2) outline the significance of darknet platforms in the distribution of drugs; and (3) explore the complex exchange of content related to the drug trade between the surface web and the darknet, along with understanding meanings constructed within the drug subculture.

    Within this context, Cebulka is identified as a critical digital venue in Poland’s dark web illicit substances scene. Besides serving as a marketplace, it plays a crucial role in shaping the narratives and discussions prevalent in the drug subculture. The dataset has proved to be a valuable tool for performing the analyses needed to achieve the project’s objectives.

    Data Content

    6. Data Description

    The data was collected in three periods, i.e., in January 2023, June 2023, and January 2024.

    The dataset comprises a sample of messages posted on Cebulka from its inception until January 2024 (including all the messages with drug advertisements). These messages include the initial posts that start each thread and the subsequent posts (replies) within those threads. The dataset is organized into two directories. The “cebulka_adverts” directory contains posts related to drug advertisements (both advertisements and comments). In contrast, the “cebulka_community” directory holds a sample of posts from other parts of the cryptomarket, i.e., those not related directly to trading drugs but rather focusing on discussing illicit substances. The dataset consists of 16,842 posts.

    7. Data Cleaning, Processing, and Anonymization

    The data has been cleaned and processed using regular expressions in Python. Additionally, all personal information was removed through regular expressions. The data has been hashed to exclude all identifiers related to instant messaging apps and email addresses. Furthermore, all usernames appearing in messages have been eliminated.

    8. File Formats and Variables/Fields

    The dataset consists of the following files:

    • Zipped .txt files (“cebulka_adverts.zip” and “cebulka_community.zip”) containing all messages. These files are organized into individual directories that mirror the folder structure found on Cebulka.
    • Two .csv files that list all the messages, including file names and the content of each post. The first .csv lists messages from “cebulka_adverts.zip,” and the second .csv lists messages from “cebulka_community.zip.”

    Ethical Considerations

    9. Ethics Statement

    A set of data handling policies aimed at ensuring safety and ethics has been outlined in the following paper:

    Harviainen, J.T., Haasio, A., Ruokolainen, T., Hassan, L., Siuda, P., Hamari, J. (2021). Information Protection in Dark Web Drug Markets Research [in:] Proceedings of the 54th Hawaii International Conference on System Sciences, HICSS 2021, Grand Hyatt Kauai, Hawaii, USA, 4-8 January 2021, Maui, Hawaii, (ed.) Tung X. Bui, Honolulu, HI, pp. 4673-4680.

    The primary safeguard was the early-stage hashing of usernames and identifiers from the messages, utilizing automated systems for irreversible hashing. Recognizing that automatic name removal might not catch all identifiers, the data underwent manual review to ensure compliance with research ethics and thorough anonymization.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Growth Market Reports (2025). Healthcare Data Anonymization Services Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/healthcare-data-anonymization-services-market
Organization logo

Healthcare Data Anonymization Services Market Research Report 2033

Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Jun 27, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description

Healthcare Data Anonymization Services Market Outlook



According to our latest research, the global healthcare data anonymization services market size reached USD 1.42 billion in 2024, reflecting a robust expansion driven by increasing regulatory demands and heightened focus on patient privacy. The market is projected to grow at a CAGR of 15.8% from 2025 to 2033, with the total market value expected to reach USD 5.44 billion by 2033. This impressive growth trajectory is underpinned by the rising adoption of digital health solutions, stringent data protection laws, and the ongoing digitalization of healthcare records worldwide.




The primary growth factor fueling the healthcare data anonymization services market is the proliferation of electronic health records (EHRs) and the expanding use of big data analytics in healthcare. As healthcare providers and organizations increasingly leverage advanced analytics for improving patient outcomes, there is a corresponding surge in data generation. However, these vast datasets often contain sensitive patient information, making data anonymization essential to ensure compliance with regulations such as HIPAA, GDPR, and other regional privacy laws. The increasing frequency of data breaches and cyberattacks has further highlighted the importance of robust anonymization services, prompting healthcare organizations to prioritize investments in data privacy and security solutions. As a result, demand for both software and service-based anonymization solutions continues to rise, contributing significantly to market growth.




Another key driver for the healthcare data anonymization services market is the growing emphasis on research and clinical trials, which require the sharing and analysis of large volumes of patient data. Pharmaceutical and biotechnology companies, as well as research organizations, are increasingly collaborating across borders, necessitating the anonymization of datasets to protect patient identities and comply with international data protection standards. The adoption of cloud-based healthcare solutions has also facilitated the secure and efficient sharing of anonymized data, supporting advancements in personalized medicine and population health management. As organizations seek to balance innovation with compliance, the demand for advanced anonymization technologies that offer high accuracy and scalability is expected to accelerate further.




Technological advancements in artificial intelligence (AI) and machine learning (ML) are also shaping the future of the healthcare data anonymization services market. These technologies are enabling more sophisticated and automated anonymization processes, reducing the risk of re-identification while maintaining data utility for research and analytics. The integration of AI-driven tools into anonymization workflows is helping organizations streamline operations, minimize human error, and achieve greater compliance with evolving regulatory requirements. Additionally, the increasing availability of customizable and interoperable anonymization solutions is making it easier for healthcare organizations of all sizes to adopt and scale these services, thereby broadening the market’s reach and impact.




From a regional perspective, North America continues to dominate the healthcare data anonymization services market, accounting for the largest share in 2024. This leadership position is attributed to the presence of advanced healthcare infrastructure, widespread adoption of EHRs, and strict regulatory frameworks governing patient data privacy. Europe follows closely, driven by the enforcement of the General Data Protection Regulation (GDPR) and a strong culture of data protection. The Asia Pacific region is witnessing the fastest growth, propelled by increasing healthcare digitalization, government initiatives to modernize healthcare systems, and rising awareness of data privacy among patients and providers. Latin America and the Middle East & Africa are also experiencing steady growth, albeit from a smaller base, as healthcare organizations in these regions begin to prioritize data security and compliance.



&

Search
Clear search
Close search
Google apps
Main menu