36 datasets found

Data De-Identification or Pseudonymity Software Market Report | Global...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data De-Identification or Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-or-pseudonymity-software-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data De-Identification or Pseudonymity Software Market Outlook

As of 2023, the global Data De-Identification or Pseudonymity Software market is valued at approximately USD 1.5 billion and is projected to grow at a robust CAGR of 18% from 2024 to 2032, driven by increasing data privacy concerns and stringent regulatory requirements.

The growth of the Data De-Identification or Pseudonymity Software market is primarily fueled by the exponential increase in data generation across industries. With the advent of IoT, AI, and digital transformation strategies, the volume of data generated has seen an unprecedented spike. Organizations are now more aware of the need to protect sensitive information to comply with global data privacy regulations such as GDPR in Europe and CCPA in California. The need to ensure that personal data is anonymized or de-identified before analysis or sharing has escalated, pushing the demand for these software solutions.

Another significant growth factor is the rising number of cyber-attacks and data breaches. As data becomes more valuable, it also becomes a prime target for cybercriminals. In response, companies are investing heavily in data privacy and security measures, including de-identification and pseudonymity solutions, to mitigate risks associated with data breaches. This trend is more prevalent in sectors dealing with highly sensitive information like healthcare, finance, and government. Ensuring that data remains secure and private while being useful for analytics is a key driver for the adoption of these technologies.

Moreover, the evolution of Big Data analytics and cloud computing is also spurring growth in this market. As organizations move their operations to the cloud and leverage big data for decision-making, the importance of maintaining data privacy while utilizing large datasets for analytics cannot be overstated. Cloud-based de-identification solutions offer scalability, flexibility, and cost-effectiveness, making them increasingly popular among enterprises of all sizes. This shift towards cloud deployments is expected to further boost market growth.

Regionally, North America holds the largest market share due to its advanced technological infrastructure and stringent data protection laws. The presence of major technology companies and a high rate of adoption of advanced solutions in the U.S. and Canada contribute significantly to regional market growth. Europe follows closely, driven by rigorous GDPR compliance requirements. The Asia Pacific region is anticipated to witness the fastest growth, attributed to the increasing digitization and growing awareness about data privacy in countries like India and China.

As organizations increasingly seek to protect their sensitive data, the concept of Data Protection on Demand is gaining traction. This model allows businesses to access data protection services as and when needed, providing flexibility and scalability. By leveraging cloud-based platforms, companies can implement robust data protection measures without the need for significant upfront investments in infrastructure. This approach not only ensures compliance with data privacy regulations but also offers a cost-effective solution for managing data security. As the demand for on-demand services continues to rise, Data Protection on Demand is poised to become a critical component of data management strategies across various industries.

Component Analysis

The Data De-Identification or Pseudonymity Software market by component is segmented into software and services. The software segment dominates the market, driven by the increasing need for automated solutions that ensure data privacy. These software solutions come with a variety of tools and features designed to anonymize or pseudonymize data efficiently, making them essential for organizations managing large volumes of sensitive information. The software market is expanding rapidly, with new innovations and improvements constantly being introduced to enhance functionality and user experience.

The services segment, though smaller compared to software, plays a crucial role in the market. Services include consulting, implementation, and maintenance, which are essential for the successful deployment and operation of de-identification software. These services help organizations tailor the software to their specific needs, ensuring compliance with regional and industry-specific data protection regulations.
D
Data De-identification & Pseudonymity Software Market Report | Global...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data De-identification & Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-pseudonymity-software-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data De-identification & Pseudonymity Software Market Outlook

The global Data De-identification & Pseudonymity Software Market is projected to reach USD 3.5 billion by 2032, growing at a CAGR of 15.2% from 2024 to 2032. The rise in data privacy regulations and the increasing need for securing sensitive information are key factors driving this growth.

The accelerating pace of digital transformation across various industries has led to an unprecedented surge in data generation. This voluminous data often contains sensitive information that needs robust protection. The growing awareness regarding data privacy and stringent regulations like GDPR in Europe, CCPA in California, and other data protection laws worldwide are compelling organizations to adopt advanced data de-identification and pseudonymity software. These solutions ensure that sensitive data is anonymized or pseudonymized, thus mitigating the risk of data breaches and ensuring compliance with regulations. Consequently, the adoption of data de-identification and pseudonymity software is rapidly increasing.

Another significant growth factor is the increased focus on data security by industries such as healthcare, finance, and government. In healthcare, the protection of patient data is paramount, making the industry a significant consumer of de-identification software. Similarly, in the finance sector, protecting customer information is crucial to maintain trust and comply with regulatory requirements. Government agencies dealing with citizen data are also increasingly investing in these technologies to prevent unauthorized access and misuse of sensitive information. The demand for data de-identification and pseudonymity software is thus witnessing a steady rise across these critical sectors.

Technological advancements and innovation in data security solutions are further propelling market growth. The integration of artificial intelligence and machine learning into de-identification and pseudonymity software has enhanced their effectiveness and efficiency. These advanced technologies enable more accurate and faster processing of large datasets, thereby offering robust data protection. Additionally, the rise of cloud computing and the increasing adoption of cloud-based solutions provide scalable and cost-effective options for organizations, further driving the market.

In this context, the role of Identity Information Protection Service becomes increasingly crucial. As organizations strive to safeguard sensitive data, these services provide an essential layer of security by ensuring that identity-related information is protected from unauthorized access and misuse. Identity Information Protection Service helps organizations comply with data privacy regulations by offering robust solutions that secure personal identifiers, thus reducing the risk of identity theft and data breaches. By integrating these services, companies can enhance their data protection strategies, ensuring that identity information remains confidential and secure across various platforms and applications.

Regionally, North America holds the largest market share, driven by stringent data protection regulations and high adoption rates of advanced technologies. Europe follows, with significant contributions from countries like Germany, the UK, and France, driven by GDPR compliance requirements. The Asia Pacific region is expected to witness the highest growth rate due to the rapid digitalization of economies like China and India, coupled with increasing awareness about data privacy. Latin America and the Middle East & Africa regions are also showing promising growth, albeit from a smaller base.

Component Analysis

The Data De-identification & Pseudonymity Software Market by component is segmented into software and services. The software segment includes standalone software solutions designed to de-identify or pseudonymize data. This segment is witnessing substantial growth due to the increasing demand for automated and scalable data protection solutions. The software solutions are enhanced with advanced algorithms and AI capabilities, providing accurate de-identification and pseudonymization of large datasets, which is crucial for organizations dealing with massive amounts of sensitive data.
Medical Imaging De-Identification Software Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Medical Imaging De-Identification Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/medical-imaging-de-identification-software-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Medical Imaging De-Identification Software Market Outlook

According to our latest research, the global medical imaging de-identification software market size reached USD 315 million in 2024, driven by the increasing adoption of digital healthcare solutions and stringent regulatory requirements for patient data privacy. The market is expected to grow at a robust CAGR of 13.2% during the forecast period, reaching approximately USD 858 million by 2033. The primary growth factor fueling this expansion is the rising volume of medical imaging data and the escalating need to ensure compliance with data protection laws such as HIPAA, GDPR, and other regional regulations.

The growth trajectory of the medical imaging de-identification software market is underpinned by the exponential increase in digital imaging procedures across healthcare facilities worldwide. As advanced imaging modalities like MRI, CT, and PET scans become standard in diagnostic workflows, the volume of data generated has surged. This data often contains sensitive patient information, making it imperative for healthcare organizations to adopt robust de-identification solutions. The proliferation of health information exchanges and the increasing emphasis on interoperability have further heightened the need for secure and compliant data sharing. These factors collectively foster a conducive environment for the adoption of de-identification software, as organizations seek to balance data utility with stringent privacy requirements.

Another major driver is the evolving regulatory landscape that mandates strict adherence to patient confidentiality and data protection standards. Regulatory frameworks such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, the General Data Protection Regulation (GDPR) in Europe, and similar regulations in Asia Pacific and other regions are compelling healthcare providers and research institutions to implement advanced de-identification solutions. These regulations impose hefty penalties for non-compliance, further incentivizing investments in software that can automate and streamline the de-identification process. Moreover, the growing trend of collaborative research and data sharing among healthcare entities necessitates reliable de-identification tools to facilitate secure and lawful data exchange.

Technological advancements in artificial intelligence and machine learning are also playing a pivotal role in shaping the medical imaging de-identification software market. Modern solutions leverage AI-driven algorithms to enhance the accuracy and efficiency of de-identification processes, reducing the risk of inadvertent data leaks. These innovations are particularly valuable in large-scale research projects, where massive datasets must be anonymized rapidly and without compromising data integrity. Furthermore, the integration of de-identification software with existing healthcare IT infrastructure, such as PACS and EHR systems, is becoming increasingly seamless, making adoption easier for end-users. This technological evolution is expected to drive further market growth over the next decade.

From a regional perspective, North America currently dominates the medical imaging de-identification software market, accounting for the largest share in 2024. The region’s leadership is attributed to the presence of advanced healthcare infrastructure, high adoption rates of digital health technologies, and stringent regulatory frameworks. Europe follows closely, propelled by GDPR compliance and increasing investments in healthcare IT. The Asia Pacific region is experiencing the fastest growth, fueled by expanding healthcare access, rapid digitalization, and rising awareness of data privacy. Latin America and the Middle East & Africa are also witnessing gradual adoption, supported by ongoing healthcare modernization initiatives and regulatory developments.

Component Analysis

The component segment of the medical imaging de-i
p
CARMEN-I: A resource of anonymized electronic health records in Spanish and...
physionet.org
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eulalia Farre Maduell; Salvador Lima-Lopez; Santiago Andres Frid; Artur Conesa; Elisa Asensio; Antonio Lopez-Rueda; Helena Arino; Elena Calvo; Maria Jesús Bertran; Maria Angeles Marcos; Montserrat Nofre Maiz; Laura Tañá Velasco; Antonia Marti; Ricardo Farreres; Xavier Pastor; Xavier Borrat Frigola; Martin Krallinger (2024). CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools [Dataset]. http://doi.org/10.13026/x7ed-9r91
Explore at:
Unique identifier
https://doi.org/10.13026/x7ed-9r91
Dataset updated
Apr 20, 2024
Authors
Eulalia Farre Maduell; Salvador Lima-Lopez; Santiago Andres Frid; Artur Conesa; Elisa Asensio; Antonio Lopez-Rueda; Helena Arino; Elena Calvo; Maria Jesús Bertran; Maria Angeles Marcos; Montserrat Nofre Maiz; Laura Tañá Velasco; Antonia Marti; Ricardo Farreres; Xavier Pastor; Xavier Borrat Frigola; Martin Krallinger
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The CARMEN-I corpus comprises 2,000 clinical records, encompassing discharge letters, referrals, and radiology reports from Hospital Clínic of Barcelona between March 2020 and March 2022. These reports, primarily in Spanish with some Catalan sections, cover COVID-19 patients with diverse comorbidities like kidney failure, cardiovascular diseases, malignancies, and immunosuppression. The corpus underwent thorough anonymization, validation, and expert annotation, replacing sensitive data with synthetic equivalents. A subset of the corpus features annotations of medical concepts by specialists, encompassing symptoms, diseases, procedures, medications, species, and humans (including family members). CARMEN-I serves as a valuable resource for training and assessing clinical NLP techniques and language models, aiding tasks like de-identification, concept detection, linguistic modifier extraction, document classification, and more. It also facilitates training researchers in clinical NLP and is a collaborative effort involving Barcelona Supercomputing Center's NLP4BIA team, Hospital Clínic, and Universitat de Barcelona's CLiC group.
Anonymized DICOM Dataset from 5T Cardiac T1 Mapping Study
zenodo.org
zip
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
linqi Ge; linqi Ge (2025). Anonymized DICOM Dataset from 5T Cardiac T1 Mapping Study [Dataset]. http://doi.org/10.5281/zenodo.15438025
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15438025
Dataset updated
May 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
linqi Ge; linqi Ge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains anonymized DICOM images acquired as part of a cardiac T1 mapping study using a 5T MRI system. All personal identifiers have been removed in compliance with DICOM de-identification standards and institutional ethics approval. The dataset includes pre- and post-contrast MOLLI sequences from healthy volunteers and patients. It is made publicly available for academic and non-commercial research purposes.
h
Optimum Patient Care Research Database (OPCRD)
healthdatagateway.org
unknown
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Optimum Patient Care (OPC) (2024). Optimum Patient Care Research Database (OPCRD) [Dataset]. http://doi.org/10.2147/POR.S395632
Explore at:
unknownAvailable download formats
Unique identifier
https://doi.org/10.2147/POR.S395632
Dataset updated
Sep 12, 2024
Dataset provided by
Optimum Patient Care Limited
Authors
Optimum Patient Care (OPC)
License
https://opcrd.co.uk/our-database/data-requests/https://opcrd.co.uk/our-database/data-requests/
Description
About OPCRD

Optimum Patient Care Research Database (OPCRD) is a real-world, longitudinal, research database that provides anonymised data to support scientific, medical, public health and exploratory research. OPCRD is established, funded and maintained by Optimum Patient Care Limited (OPC) – which is a not-for-profit social enterprise that has been providing quality improvement programmes and research support services to general practices across the UK since 2005.

Key Features of OPCRD

OPCRD has been purposefully designed to facilitate real-world data collection and address the growing demand for observational and pragmatic medical research, both in the UK and internationally. Data held in OPCRD is representative of routine clinical care and thus enables the study of ‘real-world’ effectiveness and health care utilisation patterns for chronic health conditions.

OPCRD unique qualities which set it apart from other research data resources: • De-identified electronic medical records of more than 24.9 million patients • OPCRD covers all major UK primary care clinical systems • OPCRD covers approximately 35% of the UK population • One of the biggest primary care research networks in the world, with over 1,175 practices • Linked patient reported outcomes for over 68,000 patients including Covid-19 patient reported data • Linkage to secondary care data sources including Hospital Episode Statistics (HES)

Data Available in OPCRD

OPCRD has received data contributions from over 1,175 practices and currently holds de-identified research ready data for over 24.9 million patients or data subjects. This includes longitudinal primary care patient data and any data relevant to the management of patients in primary care, and thus covers all conditions. The data is derived from both electronic health records (EHR) data and patient reported data from patient questionnaires delivered as part of quality improvement. OPCRD currently holds over 68,000 patient reported questionnaire data on Covid-19, asthma, COPD and rare diseases.

Approvals and Governance

OPCRD has NHS research ethics committee (REC) approval to provide anonymised data for scientific and medical research since 2010, with its most recent approval in 2020 (NHS HRA REC ref: 20/EM/0148). OPCRD is governed by the Anonymised Data Ethics and Protocols Transparency committee (ADEPT). All research conducted using anonymised data from OPCRD must gain prior approval from ADEPT. Proceeds from OPCRD data access fees and detailed feasibility assessments are re-invested into OPC services for the continued free provision of patient quality improvement programmes for contributing practices and patients.

For more information on OPCRD please visit: https://opcrd.co.uk/
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
healthdata.gov
+5more
application/rdfxml +5
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
D
Updated PTSS dataset for the FORAS project
dataverse.nl
csv, docx, xlsx
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Coimbra; Bruno Coimbra; Rutger Neeleman; Rutger Neeleman; Elizabeth Grandfield; Elizabeth Grandfield; Mirjam van Zuiden; Mirjam van Zuiden; Rens van de Schoot; Rens van de Schoot (2025). Updated PTSS dataset for the FORAS project [Dataset]. http://doi.org/10.34894/CRE6ZC
Explore at:
docx(48426), xlsx(9398219), csv(21840732), xlsx(1199186)Available download formats
Unique identifier
https://doi.org/10.34894/CRE6ZC
Dataset updated
Feb 5, 2025
Dataset provided by
DataverseNL
Authors
Bruno Coimbra; Bruno Coimbra; Rutger Neeleman; Rutger Neeleman; Elizabeth Grandfield; Elizabeth Grandfield; Mirjam van Zuiden; Mirjam van Zuiden; Rens van de Schoot; Rens van de Schoot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Dutch Research Council
Description
This updated labeled dataset builds upon the initial systematic review by van de Schoot et al. (2018; DOI: 10.1080/00273171.2017.1412293), which included studies on post-traumatic stress symptom (PTSS) trajectories up to 2016, sourced from the Open Science Framework (OSF). As part of the FORAS project - Framework for PTSS trajectORies: Analysis and Synthesis (funded by the Dutch Research Council, grant no. 406.22.GO.048 and pre-registered at PROSPERO under ID CRD42023494027), we extended this dataset to include publications between 2016 and 2023. In total, the search identified 10,594 de-duplicated records obtained via different search methods, each published with their own search query and result: Exact replication of the initial search: OSF.IO/QABW3 Comprehensive database search: OSF.IO/D3UV5 Snowballing: OSF.IO/M32TS Full-text search via Dimensions data: OSF.IO/7EXC5 Semantic search via OpenAlex: OSF.IO/M32TS Humans (BC, RN) and AI (Bron et al., 2024) have screened the records, and disagreements have been solved (MvZ, BG, RvdS). Each record was screened separately for Title, Abstract, and Full-text inclusion and per inclusion criteria. A detailed screening logbook is available at OSF.IO/B9GD3, and the entire process is described in https://doi.org/10.31234/osf.io/p4xm5. A description of all columns/variables and full methodological details is available in the accompanying codebook. Important Notes: Duplicates: To maintain consistency and transparency, duplicates are left in the dataset and are labeled with the same classification as the original records. A filter is provided to allow users to exclude these duplicates as needed. Anonymized Data: The dataset "...._anonymous" excludes DOIs, OpenAlex IDs, titles, and abstracts to ensure data anonymization during the review process. The complete dataset, including all identifiers, is uploaded under embargo and will be publicly available on 01-10-2025. This dataset serves not only as a valuable resource for researchers interested in systematic reviews of PTSS trajectories and facilitates reproducibility and transparency in the research process but also for data scientists who would like to mimic the screening process using different machine learning and AI models.
p
Data from: MIMIC-CXR Database
physionet.org
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Tom Pollard; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR Database [Dataset]. http://doi.org/10.13026/4jqj-jw95
Explore at:
Unique identifier
https://doi.org/10.13026/4jqj-jw95
Dataset updated
Jul 23, 2024
Authors
Alistair Johnson; Tom Pollard; Roger Mark; Seth Berkowitz; Steven Horng
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.0 is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.
P
Data from: RadCases Dataset
paperswithcode.com
huggingface.co
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael S. Yao; Allison Chae; Charles E. Kahn Jr.; Walter R. Witschey; James C. Gee; Hersh Sagreiya; Osbert Bastani (2024). RadCases Dataset [Dataset]. https://paperswithcode.com/dataset/radcases
Explore at:
Dataset updated
Sep 26, 2024
Authors
Michael S. Yao; Allison Chae; Charles E. Kahn Jr.; Walter R. Witschey; James C. Gee; Hersh Sagreiya; Osbert Bastani
Description
RadCases Dataset This HuggingFace (HF) dataset contains the raw case labels for input patient "one-liner" case summaries according to the ACR Appropriateness Criteria. Because many of the sources of data used to construct the RadCases dataset require credentialed access, we cannot publicly release the input patient case summaries. Instead, the "cases" included in this publicly available dataset are the cryptographically secure SHA-512 hashes of the original, "human-readable" cases. In this way, the hashes cannot be used to reconstruct the original RadCases dataset, but can instead be used as a lookup key to determine the ground-truth label for the dataset.

Setup Prior to using this dataset, you need to download the raw source of patient one-liners first in compliance with each of the source-specific licenses and data usage agreements. The setup process is different for each of the different dataset sources:

Synthetic: The Synthetic dataset is composed of patient one-liners synthetically generated by OpenAI's ChatGPT. You can find the raw dataset at this GitHub link. No additional setup steps are required for the Synthetic RadCases dataset. USMLE: The USMLE dataset is comprised of practice USMLE Step- 2 and 3 cases from Medbullets that are made available by Chen et al. (2024). The dataset is made publicly available by the cited authors at this GitHub link - we extract the first sentence of each question stem to use as an input patient one-liner in the RadCases dataset. JAMA: The JAMA dataset is comprised of challenging patient one-liners derived from the JAMA Clinical Challenges from the Journal of the American Medical Association (JAMA). Please follow the instructions from @HanjieChen here to first download the dataset. We extract the first sentence of each clinical challenge to use as the input patient one-liner in the RadCases dataset. NEJM: The NEJM dataset is comprised of challenging patient one-liners derived from the NEJM Case Records of the Massachusetts General Hospital from the New England Journal of Medicine (NEJM). We provide a script build_nejm_dataset.py to scrape the case records from the DOIs listed here, which are the same as those used by Savage et al. (2024).. The resulting nejm.jsonl file generated by the script should then be added to the radGPT home directory. BIDMC: The Beth Israel Deaconess Medical Center (BIDMC) dataset is comprised of real anonymized, de-identified patient one-liners derived from the MIMIC-IV Dataset. Please request access to the MIMIC-IV dataset here. The discharge.csv.gz file should then be added to the radGPT/radgpt/data directory.

Dataset Structure Each row of the dataset is a (SHA-512 hash of a) patient "one-liner" case mapping to an ACR Appropriateness Criteria topic, and also the parent panel of that topic.

case: the SHA-512 hash of the patient one-liner panel: the ACR Appropriateness Criteria panel label of the patient one-liner topic: the ACR Appropriateness Criteria topic label of the patient one-liner

Retrieving A Label To retrieve a ground-truth ACR label from this dataset, you can use the following source code:

import hashlib prompt = input("Patient One-Liner Case: ") hash_gen = hashlib.sha512() hash_gen.update(prompt.encode()) hash_val = str(hash_gen.hexdigest())

The corresponding hash_val variable can then be used to lookup the corresponding panel or topic by matching hash_val with the case value in the RadCases dataset.

Direct Dataset Usage You can download the contents of this dataset using the following terminal command:

git clone https://huggingface.co/datasets/michaelsyao/RadCases

Deidentified Horticulture Import Testing Results

zenodo.org

csv

Updated Jul 4, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Robert Clark; Robert Clark; Mahdi Parsa; Mahdi Parsa; Belinda Barnes; Belinda Barnes; Sumonkanti Das; Sumonkanti Das (2024). Deidentified Horticulture Import Testing Results [Dataset]. http://doi.org/10.5281/zenodo.12615128

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12615128

Dataset updated

Jul 4, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Robert Clark; Robert Clark; Mahdi Parsa; Mahdi Parsa; Belinda Barnes; Belinda Barnes; Sumonkanti Das; Sumonkanti Das

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

These three datasets contain de-identified data on testing for pests in imports of horticultural products into Australia in a period within 2021-2023. The creator of this data page is distributing the data with the permission of the data owner (emails 14/6/2024, 25/6/2024, 1/7/2024).

Dataset anonymized_hort_aggdat_01-07-2024.csv

This dataset (anonymized_hort_aggdat_01-07-2024.csv) has one row for each line of fruit or vegetables tested. Consignments of fruits or vegetables are divided into lines (details may depend on the type of fruit or vegetable). 600 units are sampled from each line, where a unit is usually a single fruit or vegetable (rounding may occur, for example if fruit are grouped into punnets). A result is then obtained from each line ("inspection result"). If the result is not Pass, then fumigation or other actions may be taken. The columns of the data are:

Variable Name	Values	Definition
entry	ANONYMIZED_VALUE1, ANONYMIZED_VALUE2, etc	anonymised identifier of the consignment
volume	numeric	volume of the line
volume_unit	KG – kilograms 0 GI blank	units in which volume is measured (almost always kg)
arrival_date	date
importer_name	ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc	anonymised identifer of the importer
supplier_name	ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc	anonymised identifer of the supplier
cargo_type		the freight type of the consignment (e.g., FCL and FCX are container types via sea and AIR is air freight)
port	character valued code	destination port of the consignment/entry
country	ANONYMIZED_VALUE_1, ANONYMIZED_VALUE2 etc	anonymised country of origin
finalise_type		whether the line was released as normal, from biosecurity control, disposed of, destroyed or exported
document_failure	Pass, Fail	whether a failure was recorded against a line at onshore document verification. Note: A fail then followed by a pass and goods moving to inspection, will display fail.
inspection_result	Pass, Fail	whether a failure was recorded against a line at onshore verification inspection. Note: A fail then followed by a pass and goods being released, will display fail. Lines that qualified for the Compliance-Based Intervention Scheme (CBIS) may not have been inspected as a result. See here for more information about CBIS.
fumigated	Not fumigated, Fumigated	Whether line was fumigated
other_treatment	character	other remedial treatment applied to the line/entry (reconditioning for seeds)
cbis_commodity	Fresh CBIS, Other	"Fresh CBIS" means that the line qualified for the Compliance-Based Intervention Scheme (CBIS) and may not have been inspected as a result. "Other" means that the line did not qualify for CBIS. See here for more information about CBIS.
actionable		Where the department's Science Services Group have determined that detected biosecurity risk material requires remedial action to mitigate biosecurity risk. Note: Seeds are only actioned if a high risk weed seed is detected or were 3 or more species of biosecurity concern are identified.
commodity	character	Commodity description
rcd_nbr	1, 2, 3 etc	anonymised identifier of line

Dataset anonymized_hort_pests_01-07-2024.csv

This dataset contains a row for when there is a pest detection. Note that not all pest detections require action. It may be linked to anonymized_hort_aggdat_01-07-2024.csv using rcd_nbr as a key. The columns of the data are:

Variable Name	Values	Definition
rcd_nbr	1, 2, 3 etc	anonymised identifier of line
bottle_number	numeric	identifier for a particular pest for a particular line
pest_type	Disease, Invertebrate, Plant, Seed, Vertebrate, Na, blank	type of potential pest

Dataset anonymized_hort_seeds_incidents_01-07-2024.csv

This dataset contains a row for seeds detections. Note that not all seed detections require action. It may be linked to anonymized_hort_aggdat_01-07-2024.csv by rcd_nbr as a key and to anonymized_hort_pests_01-07-2024.csv using bottle_number as a key. The columns of the data are:

Variable Name	Values	Definition
rcd_nbr	1, 2, 3 etc	anonymised identifier of line
bottle_number	numeric	identifier for a particular pest for a particular line
pest_type	Disease, Invertebrate, Plant, Seed, Vertebrate, Na, blank	type of potential pest (always equal to Seed in this spreadsheet)
comments	text field	comments
other_treatment	Reconditioned, or blank	other treatments applied

f
VICTORY study - dataset
uvaauas.figshare.com
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ewoud Baarsma; Joppe Hovius (2023). VICTORY study - dataset [Dataset]. http://doi.org/10.21942/uva.17113355.v1
Explore at:
Unique identifier
https://doi.org/10.21942/uva.17113355.v1
Dataset updated
May 31, 2023
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
Ewoud Baarsma; Joppe Hovius
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains meta-data on the VICTORY study, a diagnostic test accuracy study on cellular tests for Lyme borreliosis. The full protocol can be accessed via DOI 10.1186/s12879-019-4323-6. The dataset for the VICTORY study can be obtained from the principal investigator for Amsterdam UMC (prof. Joppe Hovius) on behalf of the research partners at the Radboudumc and National Institute of Public Health and the Environment (RIVM). The dataset is available upon reasonable request and subject to certain limitations. Conditions include:- re-use of data for a scientifically valid and methodologically sound research project- data will only be shared for collaborative efforts, not for use by third parties only- de-identified data may not leave control Amsterdam UMC, Radbodumc or RIVM; anonymized data may also be used by a third party- contractual obligations regarding the rights of participating commercial partners are respected (e.g., prior review of intended publications)- legal rights of study participants are respected- re-use is always subject to applicable law, institutional regulations and review by the medical ethics committee, if applicable. Contact with the principal investigator can be sought via lyme@amsterdamumc.nl or victory@amsterdamumc.nl.
p
Data from: MIMIC-CXR-JPG - chest radiographs with structured labels
physionet.org
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR-JPG - chest radiographs with structured labels [Dataset]. http://doi.org/10.13026/jsn5-t979
Explore at:
Unique identifier
https://doi.org/10.13026/jsn5-t979
Dataset updated
Mar 12, 2024
Authors
Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
The MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) Database v2.0.0 is a large publicly available dataset of chest radiographs in JPG format with structured labels derived from free-text radiology reports. The MIMIC-CXR-JPG dataset is wholly derived from MIMIC-CXR, providing JPG format files derived from the DICOM images and structured labels derived from the free-text reports. The aim of MIMIC-CXR-JPG is to provide a convenient processed version of MIMIC-CXR, as well as to provide a standard reference for data splits and image labels. The dataset contains 377,110 JPG format images and structured labels derived from the 227,827 free-text radiology reports associated with these images. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.
m
AIP-OS Dataset: Cognitive Performance in AI-Mediated University Learning
data.mendeley.com
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dougglas Hurtado-Carmona (2025). AIP-OS Dataset: Cognitive Performance in AI-Mediated University Learning [Dataset]. http://doi.org/10.17632/k3sdtd2g6z.1
Explore at:
Unique identifier
https://doi.org/10.17632/k3sdtd2g6z.1
Dataset updated
Jun 5, 2025
Authors
Dougglas Hurtado-Carmona
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides the complete anonymized academic performance records of 182 undergraduate students from a quasi-experimental study conducted in Colombia. It supports the findings of the article “Redefining the classroom: posthuman cognition and algorithmic agency in higher education.” Students were divided into a control group and an experimental group exposed to the AIP-OS intelligent agent, developed under the Belief–Desire–Intention (BDI) architecture. The dataset includes individual-level scores on interpretive, argumentative, and propositional competencies across five curricular units, along with aggregate performance metrics and cognitive profiles. All data are de-identified and fully compliant with ethical research standards.

Keywords Educational AI; BDI agent; Higher education; Cognitive performance; Posthuman pedagogy; Learning trajectories; Human–machine interaction; Distributed cognition
V
Dataset from Short Period Incidence Study of Severe Acute Respiratory...
data.niaid.nih.gov
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Ng, MBBS (2024). Dataset from Short Period Incidence Study of Severe Acute Respiratory Illness [Dataset]. http://doi.org/10.25934/PR00007465
Explore at:
Unique identifier
https://doi.org/10.25934/PR00007465
Dataset updated
Nov 27, 2024
Dataset provided by
IDDO
Authors
Nicole Ng, MBBS
Area covered
Australia
Variables measured
Hospitalization, Duration Of Hospital Stay
Description
This is a multi-centre, prospective, short period incidence observational study of patients in participating hospitals and intensive care units (ICUs) with SARI. The study period will occur, in both Northern and Southern hemispheric winters. The study period will comprise a 5 to 7-day cohort study in which patients meeting a SARI case-definition, who are newly admitted to the hospitals / ICUs at participating sites, will be included in the study. The study will be conducted in 20 to 40-hospital/ ICU-based research networks globally. All clinical information and sample data will only be recorded if taken as part of the routine clinical practice at each site and only fully anonymised and de-identified data will be submitted centrally.

The primary aim of this study is to establishing a research response capability for a future epidemic / pandemic through a global SARI observational study. The secondary aim of this study is to investigate the descriptive epidemiology and microbiology profiles of patients with SARI. The tertiary aim of this study is to assess the Ethics, Administrative, Regulatory and Logistic (EARL) barriers to conducting pandemic research on a global level.
Postnatal Affective MRI Dataset
openneuro.org
Updated Sep 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PhD Heidemarie Laurent; Megan K. Finnegan; Katherine Haigler (2020). Postnatal Affective MRI Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003136.v1.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds003136.v1.0.0
Dataset updated
Sep 12, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
PhD Heidemarie Laurent; Megan K. Finnegan; Katherine Haigler
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Postnatal Affective MRI Dataset

Authors Heidemarie Laurent, Megan K. Finnegan, and Katherine Haigler

The Postnatal Affective MRI Dataset (PAMD) includes MRI and psych data from 25 mothers at three months postnatal, with additional psych data collected at three additional timepoints (six, twelve, and eighteen months postnatal). Mother-infant dyad psychosocial tasks and cortisol samples were also collected at all four timepoints, but this data is not included in this dataset. In-scanner tasks involved viewing own- and other-infant affective videos and viewing and labeling adult affective faces. This repository includes de-identified MRI, in-scanner task, demographic, and psych data from this study.

Citation Laurent, H., Finnegan, M. K., & Haigler, K. (2020). Postnatal Affective MRI Dataset. OpenNeuro. Retrieved from OpenNeuro.org.

Acknowledgments Saumya Agrawal was instrumental in getting the PAMD dataset into a BIDS-compliant structure.

Funding This work was supported by the Society for Research in Child Development Victoria Levin Award "Early Calibration of Stress Systems: Defining Family Influences and Health Outcomes" to Heidemarie Laurent and by the University of Oregon College of Arts and Sciences

Contact For questions about this dataset or to request access to alcohol- and tobacco-related psych data, please contact Dr. Heidemarie Laurent, hlaurent@illinois.edu.

References Laurent, H. K., Wright, D., & Finnegan, M. K. (2018). Mindfulness-related differences in neural response to own-infant negative versus positive emotion contexts. Developmental Cognitive Neuroscience 30: 70-76. https://doi.org/10.1016/j.dcn.2018.01.002.

Finnegan, M. K., Kane, S., Heller, W., & Laurent, H. (2020). Mothers' neural response to valenced infant interactions predicts postnatal depression and anxiety. PLoS One (under review).

MRI Acquisition The PAMD dataset was acquired in 2015 at the University of Oregon Robert and Beverly Lewis Center for Neuroimaging with a 3T Siemens Allegra 3 magnet. A standard 32-channel phase array birdcage coil was used to acquire data from the whole brain. Sessions began with a shimming routine to optimize signal-to-noise ratio, followed by a fast localizer scan (FISP) and Siemens Autoalign routine, a field map, then the 4 functional runs and anatomical scan.

Anatomical: T1*-weighted 3D MPRAGE sequence, TI=1100 ms, TR=2500 ms, TE=3.41 ms, flip angle=7°, 176 sagittal slices, 1.0mm thick, 256×176 matrix, FOV=256mm.

Fieldmap: gradient echo sequence TR=.4ms, TE=.00738 ms, deltaTE=2.46 ms, 4mm thick, 64x64x32x2 matrix.

Task: T2-weighted gradient echo sequence, TR=2000 ms, TE=30 ms, flip angle=90°, 32 contiguous slices acquired ascending and interleaved, 4 mm thick, 64×64 voxel matrix, 226 vols per run.

Participants Mothers (n=25) of 3-month-old infants were recruited from the Women, Infants, and Children program and other community agencies serving low-income women in a midsize Pacific Northwest city. Mothers' ages ranged from 19 to 33 (M=26.4, SD=3.8). Most mothers were Caucasian (72%, 12% Latina, 8% Asian American, 8% other) and married or living with a romantic partner (88%). Although most reported some education past high school (84%), only 24% had completed college or received a graduate degree, and their median household income was between $20,000 and $29,999. For more than half of the mothers (56%), this was their first child (36% second child, 8% third child). Most infants were born on time (4% before 37 weeks and 8% after 41 weeks of pregnancy), and none had serious health problems. A vaginal delivery was reported by 56% of mothers, with 88% breastfeeding and 67% bed-sharing with their infant at the time of assessment. Over half of the mothers (52%) reported having engaged in some form of contemplative practice (mostly yoga and only 8% indicated some form of meditation), and 31% reported currently engaging in that practice. All women gave informed consent prior to participation, and all study procedures were approved by the University of Oregon Institutional Review Board. Due to a task malfunction, participant 178's scanning session was split over two days, with the anatomical acquired in ses-01, and the field maps and tasks acquired in ses-02.

Study overview Mothers visited the lab to complete assessments at four timepoints postnatal: the first session occurred when mothers were approximately three months postnatal (T1), the second session at approximately six months postnatal (T2), the third session at approximately twelve months postnatal (T3), and the fourth and last session at approximately eighteen months postnatal (T4). MRI scans were acquired shortly after their first session (T1).

Asssessment data Assessments collected during sessions include demographic, relationship, attachment, mental health, and infant-related questionnaires. For a full list of included measures and timepoints at which they were acquired, please refer to PAMD_codebook.tsv in the phenotype folder. Data has been made available and included in the phenotype folder as 'PAMD_T1_psychdata', 'PAMD_T2_psychdata', 'PAMD_T3_psychdata', 'PAMD_T4_psychdata'. To protect participants' privacy, all identifiers and questions relating to drugs or alcohol have been removed. If you would like access to drug- and alcohol-related questions, please contact the principle investigator, Dr. Heidemarie Laurent, to request access. Assessment data will be uploaded shortly.

Post-scan ratings After the scan session, mothers watched all of the infant videos and rated the infant's and their own emotional valence and intensity for each video. For valence, mothers were asked "In this video clip, how positive or negative is your baby's emotion?" and "While watching this video clip, how positive or negative is your emotion? from -100 (negative) to +100 (positive). For emotional intensity, mothers were asked "In this video clip, how intense is your baby's emotion?" and "While watching this video clip, how intense is your emotion?"" on a scale of 0 (no intensity) to 100 (maximum intensity). Post-scan ratings are available in the phenotype folder as "PAMD_Post-ScanRatings."

MRI Tasks

Neural Reactivity to Own- and Other-Infant Affect

File Name: task-infant

Approximately three months postnatal, a graduate research assistant visited mothers’ homes to conduct a structured clinical interview and video-record the mother interacting with her infant during a peekaboo and arm-restraint task, designed to elicit positive and negative emotions, respectively. The mother and infant were face-to-face for both tasks. For the peekaboo task, the mother covered her face with her hands and said "baby," then opened her hands and said "peekaboo" (Montague and Walker-Andrews, 2001). This continued for three minutes, or until the infant showed expressions of joy. For the arm-restraint task, the mother changed their baby's diaper and then held the infant's arms to their side for up to two minutes (Moscardino and Axia, 2006). The mother was told to keep her face neutral and not talk to her infant during this task. This procedure was repeated with a mother-infant dyad that were not included in the rest of the study to generate other-infant videos. Videos were edited to 15-second clips that showed maximum positive and negative affect. Presentation® software (Version 14.7, Neurobehavioral Systems, Inc. Berkeley, CA, www.neurobs.com) was used to present positive and negative own- and other-infant clips and rest blocks in counterbalanced order during two 7.5-minute runs. Participants were instructed to watch the videos and respond as they normally would without additional task demands. To protect participants' and their infants' privacy, infant videos will not be made publicly available. However, the mothers' post-scan rating of their infant's, the other infant's, and their own emotional valence and intensity can be found in the phenotype folder as "PAMD_Post-ScanRatings."

Observing and Labeling Affective Faces

File Name: task-affect

Face stimuli were selected from a standardized set of images (Tottenham, Borscheid, Ellersten, Markus, & Nelson, 2002). Presentation Software (version 14.7, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com) was used to show participants race-matched adult target faces displaying emotional expressions (positive: three happy faces; negative: one fear, one sad, one anger; two from each category were open-mouthed; one close-mouthed) and were instructed to "observe" or choose the correct affect label for the target image. In the observe task, subjects viewed an emotionally evocative face without making a response. During the affect-labeling task, subjects chose the correct affect label (e.g., "scared," "angry," "happy," "surprised") from a pair of words shown at the bottom of the screen (Lieberman et al., 2007). Each block was preceded by a 3-second instruction screen cueing participants for the current task ("observe" and "affect labeling") and consisted of five affective faces presented for 5 seconds each, with a 1- to 3-second jittered fixation cross between stimuli. Each run consisted of twelve blocks (six observe; six label) counterbalanced within the run and in a semi-random order of trials within blocks (no more than four in a row of positive or negative and, in the affect-labeling task, of the correct label on the right or left side).

.Nii to BIDs

The raw DICOMs were anonymized and converted to BIDS format using the following procedure (for more details, seehttps://github.com/Haigler/PAMD_BIDS/).

Deidentifying DICOMS: Batch Anonymization of the DICOMS using DicomBrowser (https://nrg.wustl.edu/software/dicom-browser/)

Conversion to .nii and BIDS structure: Anonymized DICOMs were converted to
c
University of Missouri Post-operative Glioma Dataset
cancerimagingarchive.net
n/a, nifti, xlsx
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). University of Missouri Post-operative Glioma Dataset [Dataset]. http://doi.org/10.7937/7k9k-3c83
Explore at:
xlsx, n/a, niftiAvailable download formats
Unique identifier
https://doi.org/10.7937/7k9k-3c83
Dataset updated
Mar 21, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Mar 21, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
This dataset includes MR imaging from 203 glioma patients with 617 different post-treatment MR time points, and tumor segmentations. Clinical data includes patient demographics, genomics, and treatment details. Preprocessing of MR images followed a standardized pipeline with automatic tumor segmentation based on nnUNet deep learning approach. The automatic tumor segmentations were manually validated and refined by neuroradiologists.
The heterogeneity of glioma imaging characteristics and management strategies contributes to a lack of reliable findings when evaluating treatment outcomes with conventional MRI, and the overlapping imaging features of radiation necrosis and tumor progression post-treatment can be particularly challenging for radiologists. This robust dataset should contribute to the development of AI models to improve evaluation of treatment outcomes.
Introduction
The dataset consists of institutional review board-approved retrospective analysis of pathologically proven glioma patients at University Hospital of The University of Missouri - Anatomic Pathology CoPathPlus database was used to collect glioma cases over the last 10 years.
Sharing segmented postoperative glioma data with clinical information significantly accelerates research and improves clinical practice by providing a comprehensive, readily available dataset. This eliminates the time-consuming burden of manual segmentation, enhances the accuracy and consistency of tumor delineation, and allows researchers to focus on analysis and interpretation, ultimately driving the development of more accurate segmentation algorithms, predictive models for personalized treatment strategies, and improved patient outcome predictions. Standardized longitudinal follow-up and benchmarking capabilities further facilitate multi-center studies and objective evaluation of treatment efficacy, leading to advancements in glioma biology and personalized patient care.
Methods
The following subsections provide information about how the data were selected, acquired, and prepared for publication.
Subject Inclusion and Exclusion Criteria
The selection criteria for the CoPath Natural Language II Search included accession dates ranging from 01/01/2021 to 02/20/2024. To ensure all relevant diagnoses for this study were included; three separate keyword searches were performed using "glioma", "astrocytoma", and "glioblastoma". The search only included keyword results that were present in the Final Diagnoses. "Glioma" returned 85 cases; "Astrocytoma" returned 67 cases; and "Glioblastoma" returned 215 cases. Following the exclusion of duplicate cases, those missing any of the four requisite MR imaging sequences, and cases that failed processing through our pipeline, our final cohort comprised 203 patients.
Data Acquisition
Radiology: MRI studies on our McKesson Radiology 12.2 Picture archiving and communication system (PACS) (Change Healthcare Radiology Solutions, Nashville, Tennessee, U.S) were exported. The image exportation process involved multiple personnels of varying ranks, including medical graduates, radiology residents, neuroradiology fellows, and neuroradiologists. Our team exported the four basic conventional MR sequences including T1, T1 with IV gadolinium-based contrast agent administration, T2, and Fluid Attenuated Inversion Recovery (FLAIR) into a HIPPA compliant MU secured research server.
For each patient, the images were thoroughly checked for including up to six post-treatment images as available. The post-treatment images were captured on different dates, though not all patients had the maximum number of follow-up images; some had as few as one post-treatment follow-up MRI. For patients with more frequent follow-up MRIs, the immediate post-operative scan, at least one time point of progression and another follow-up study. The MR images were comprehensively reviewed to exclude significantly motion degraded or suboptimal studies.
The majority of the studies were conducted using Siemens MRI machines 97.47%, n=579 with a smaller proportion performed on MRI machines from other vendors: GE (2.02%, n=12) and Philips (0.51%, n=3). Table 1 shows the distribution of studies across different Siemens MR machines. Regarding the magnetic field strength, 1.5T MRIs accounted for 48.14% (n=1,126), 3T MRIs accounted for 45.08% (n=318), and 3T MRIs accounted for 45.08% (n=261). Table 2 summarizes the MRI parameters of each MR sequence.
Our team made efforts to obtain 3D sequences whenever available. Scans were performed using 3D acquisition methods in 40.28% of cases (n=975) and 2D acquisition methods in 59.82% of cases (n=1,419). In cases where 3D images were not available, 2D images were utilized instead. Table 3 summarizes the counts and percentage of studies performed with 2D vs 3D acquisition across different MR sequences.
Clinical: Basic demographic data, clinical data points, and tumor pathology were obtained through review of the electronic medical record (EMR). Clinical data points included the date of diagnosis, date of first surgery or treatment, date and characterization of first and/or subsequent disease progression and/or recurrence, and date of any follow-up resections. Survival information included the date of death and, if that was unknown, the date of last known contact while alive. Disease progression and/or recurrence was characterized as imaging only, clinical only, or both based on information obtained through review of each patient’s clinical notes, brain imaging, and clinical impression as documented by the primary care team. Brief summaries of the reasoning behind each characterization were also included. Patients with no further clinical contact beyond their primary treatment were documented as “lost to follow-up.” Pathological information was obtained through review of the initial pathology note and any subsequent addenda for each tumor sample and included final tumor diagnosis, grade, and any identified genetic mutations. This information was then compiled into a spreadsheet for analysis.
Data Analysis
The image data underwent preprocessing using the Federated Tumor Segmentation (FeTS) tool. The pipeline began with converting DICOM files to the Neuroimaging Informatics Technology Initiative (NIfTI) format, ensuring the removal of any remaining PHI not eliminated by the anonymization/de-identification tool. The converted NIfTI images were then resampled to an isotropic 1mm³ resolution and co-registered to the standard anatomical human brain atlas, SRI24. A deep learning brain extraction method was applied to strip the skull and extracranial tissues, thereby mitigating any potential facial reconstruction or recognition risks.
The preprocessed images were segmented using a deep network based on nnU-Net, resulting in four distinct labels that correspond to different components of each tumor:
Label 1: Non-enhancing Tumor Core (NETC). This label identifies non-enhancing components within the tumor, such as cystic, necrotic, or hemorrhagic portions.
Label 2: Surrounding Non-enhancing FLAIR Hyperintensity (SNFH). This label represents both non-enhancing infiltrative tumor components and peritumoral vasogenic edema.
Label 3: Enhancing Tissue (ET). This label highlights the viable nodular-enhancing components of the tumor.
Label 4: Resection Cavity (RC). This label covers post-surgical changes, including recent changes like blood products and air foci, as well as chronic changes with materials isointense to CSF signal.
A spreadsheet is also provided that includes tumor volumes and signal intensity of different tumor components across various MR sequences.
Usage Notes
Each scan was manually exported using the built-in McKesson DICOM export tool into separate folders labeled as post-treatment 1, post-treatment 2, etc. In a subsequent step, a subset of the data was selected to contribute for the development of FeTS 2 toolbox. Consequently, the naming convention was updated to replace "post-treatment" with "timepoint" (e.g., post-treatment 1 became timepoint 1) to adhere to the instructions of the FeTS development team. Each sequence was saved in its own folder within these categories to a HIPPA compliant and secured server within the University of Missouri network. Exportation was conducted in DICOM format, maintaining the original image compression settings to preserve quality. To ensure patient privacy and HIPPA compliance, all images were anonymized and all protected health information (PHI) e.g. patient name, MRN, accession number, etc. were deleted from the metadata DICOM headers.
The folders are labeled in the following structure:
Main folder: PatientID_XXXX
Subfolders: Timepoint_X, Timepoint_X
Each time point folder has the NIfTI images associated with the respective timepoints.
Z
Pain Interventions in Dementia - Pain events dataset
data.niaid.nih.gov
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koppitz Andrea (2023). Pain Interventions in Dementia - Pain events dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6359399
Explore at:
Dataset updated
Apr 28, 2023
Dataset provided by
Volken Thomas
Koppitz Andrea
Spichiger Frank
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is data collected for a quasi-experimental nurse-led intervention trial based on a convenience sample of three nursing homes. It was collected in the Swiss Canton of Zurich and Thurgau and serves to examine the effects on dementia patients, the healthcare institution, and the qualification level of the healthcare workers using an event analysis and a multilevel analysis. Healthcare workers have been individually trained on how to assess, intervene and evaluate acute and chronic pain with BESD and/or VAS. There are three data-monitoring cycles (T0, T1, T2) and two intervention cycles (I1, I2) with a total study duration of 425 days. The raw data has been cryptographically anonymized using an SSL stream and further de-identification techniques.

Also see: 10.1186/s12904-017-0200-5
Domestic Electrical Load Survey Secure Data 1994-2014 - South Africa
datafirst.uct.ac.za
Updated Jun 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eskom (2019). Domestic Electrical Load Survey Secure Data 1994-2014 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/757
Explore at:
Dataset updated
Jun 20, 2019
Dataset provided by
Eskomhttp://www.eskom.co.za/
Stellenbosch University
University of Cape Town
Time period covered
1995 - 2014
Area covered
South Africa
Description
Abstract

This dataset contains sensitive data that has not been disclosed in the online version of the Domestic Electrical Load Survey (DELS) 1994-2014 dataset. In contrast to the DELS dataset, the DELS Secure Data contains partially anonymised survey responses with only the names of respondents and home owners removed. The DELSS contains street and postal addresses, as well as GPS level location data for households from 2000 onwards. The GPS data is obtained through an auxiliary dataset, the Site Reference database. Like the DELS, the DELSS dataset has been retrieved and anonymised from the original SQL database with the python package delretrieve.

Geographic coverage

The study had national coverage.

Analysis unit

Households and individuals

Universe

The survey covers electrified households that received electricity either directly from Eskom or from their local municipality. Particular attention was devoted to rural and low income households, as well as surveying households electrified over a range of years, thus having had access to electricity from recent times to several decades.

Kind of data

Sample survey data

Sampling procedure

See sampling procedure for DELS 1994-2014

Mode of data collection

Face-to-face [f2f]

Cleaning operations

This dataset has been produced by extracting only the survey responses from the original NRS Load Research SQL database using the saveAnswers function from the delretrieve python package (https://github.com/wiebket/delretrieve: release v1.0). Full instructions on how to use delretrieve to extract data are in the README file contained in the package.

PARTIAL DE-IDENTIFICATION Partial de-identification was done in the process of extracting the data from the SQL database with the delretrieve package. Only the names of respondents and home owners have been removed from the survey responses by replacing responses with an 'a' in the dataset. Documents with full details of the variables that have been anonymised are included as external resources.

MISSING VALUES Other than partial de-identification no post-processing was done and all database records, including missing values, are stored exactly as retrieved.

Data appraisal

See notes on data quality for DELS 1994-2014
C
Community Nutrition Program in El Alto, Bolivia Data: 2014-2017
data.iadb.org
csv, dta, pdf
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IDB Datasets (2025). Community Nutrition Program in El Alto, Bolivia Data: 2014-2017 [Dataset]. http://doi.org/10.60966/7yxk-5t52
Explore at:
csv(566373), csv(136937), csv(2266024), csv(434197), csv(200), dta(743893), csv(287), csv(7429), csv(2285407), csv(1102826), csv(1188), csv(123675), dta(128889), csv(841129), dta(3291280), csv(842116), dta(2594697), dta(12817461), csv(484179), csv(3515), csv(30221), csv(2047298), csv(109571), csv(9371), csv(15581), dta(5589277), dta(162108), dta(2919157), csv(762472), csv(1479), csv(2487), csv(983), csv(1716466), dta(2457777), csv(349), pdf(641421), csv(959722), dta(462544), csv(8475), csv(2957), csv(5607), dta(1662192), csv(13099), dta(1263639), dta(3224578), dta(2409816), dta(5091329), csv(2922733), dta(9441688), csv(1816), csv(1552), csv(5838), dta(849656), csv(1087720), dta(3140665), csv(3481), dta(2634846), csv(1865168)Available download formats
Unique identifier
https://doi.org/10.60966/7yxk-5t52
Dataset updated
Apr 10, 2025
Dataset provided by
IDB Datasets
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Time period covered
Jan 1, 2014 - Jan 1, 2017
Area covered
El Alto, Bolivia
Description
The Community Nutrition Program in El Alto, Bolivia is the second phase of a program originally implemented between 2008 and 2011. The newly-structured program had objectives to improve infant and young child feeding practices, hygiene, and child nutritional status through a behavioral-change strategy based on participatory play education. This dataset includes the data of an evaluation survey to this program, which contains three rounds of data‚ÄîBaseline (2014), endline 1 (2016) and endline 2 (2017)‚Äîon an (unbalanced) panel of 2015 households with children under the age of 12 months or pregnant women at baseline. The survey includes a rich set of socio-economic, demographic and nutrition variables. Datasets were anonymized to protect subject privacy. Variable names, their labels, and any actions taken for de-identification purposes are noted in the codebook below.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dataintelo (2025). Data De-Identification or Pseudonymity Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-de-identification-or-pseudonymity-software-market

Data De-Identification or Pseudonymity Software Market Report | Global Forecast From 2025 To 2033

Explore at:

pdf, pptx, csvAvailable download formats

Dataset updated

Jan 7, 2025

Dataset authored and provided by

Dataintelo

License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered

2024 - 2032

Area covered

Global

Description

Data De-Identification or Pseudonymity Software Market Outlook

As of 2023, the global Data De-Identification or Pseudonymity Software market is valued at approximately USD 1.5 billion and is projected to grow at a robust CAGR of 18% from 2024 to 2032, driven by increasing data privacy concerns and stringent regulatory requirements.

The growth of the Data De-Identification or Pseudonymity Software market is primarily fueled by the exponential increase in data generation across industries. With the advent of IoT, AI, and digital transformation strategies, the volume of data generated has seen an unprecedented spike. Organizations are now more aware of the need to protect sensitive information to comply with global data privacy regulations such as GDPR in Europe and CCPA in California. The need to ensure that personal data is anonymized or de-identified before analysis or sharing has escalated, pushing the demand for these software solutions.

Another significant growth factor is the rising number of cyber-attacks and data breaches. As data becomes more valuable, it also becomes a prime target for cybercriminals. In response, companies are investing heavily in data privacy and security measures, including de-identification and pseudonymity solutions, to mitigate risks associated with data breaches. This trend is more prevalent in sectors dealing with highly sensitive information like healthcare, finance, and government. Ensuring that data remains secure and private while being useful for analytics is a key driver for the adoption of these technologies.

Moreover, the evolution of Big Data analytics and cloud computing is also spurring growth in this market. As organizations move their operations to the cloud and leverage big data for decision-making, the importance of maintaining data privacy while utilizing large datasets for analytics cannot be overstated. Cloud-based de-identification solutions offer scalability, flexibility, and cost-effectiveness, making them increasingly popular among enterprises of all sizes. This shift towards cloud deployments is expected to further boost market growth.

Regionally, North America holds the largest market share due to its advanced technological infrastructure and stringent data protection laws. The presence of major technology companies and a high rate of adoption of advanced solutions in the U.S. and Canada contribute significantly to regional market growth. Europe follows closely, driven by rigorous GDPR compliance requirements. The Asia Pacific region is anticipated to witness the fastest growth, attributed to the increasing digitization and growing awareness about data privacy in countries like India and China.

As organizations increasingly seek to protect their sensitive data, the concept of Data Protection on Demand is gaining traction. This model allows businesses to access data protection services as and when needed, providing flexibility and scalability. By leveraging cloud-based platforms, companies can implement robust data protection measures without the need for significant upfront investments in infrastructure. This approach not only ensures compliance with data privacy regulations but also offers a cost-effective solution for managing data security. As the demand for on-demand services continues to rise, Data Protection on Demand is poised to become a critical component of data management strategies across various industries.

Component Analysis

The Data De-Identification or Pseudonymity Software market by component is segmented into software and services. The software segment dominates the market, driven by the increasing need for automated solutions that ensure data privacy. These software solutions come with a variety of tools and features designed to anonymize or pseudonymize data efficiently, making them essential for organizations managing large volumes of sensitive information. The software market is expanding rapidly, with new innovations and improvements constantly being introduced to enhance functionality and user experience.

The services segment, though smaller compared to software, plays a crucial role in the market. Services include consulting, implementation, and maintenance, which are essential for the successful deployment and operation of de-identification software. These services help organizations tailor the software to their specific needs, ensuring compliance with regional and industry-specific data protection regulations.

Clear search

Close search

Google apps

Main menu

Data De-Identification or Pseudonymity Software Market Report | Global...

Data De-Identification or Pseudonymity Software Market Outlook

Component Analysis

Data De-identification & Pseudonymity Software Market Report | Global...

Data De-identification & Pseudonymity Software Market Outlook

Component Analysis

Medical Imaging De-Identification Software Market Research Report 2033

Medical Imaging De-Identification Software Market Outlook

Component Analysis

CARMEN-I: A resource of anonymized electronic health records in Spanish and...

Anonymized DICOM Dataset from 5T Cardiac T1 Mapping Study

Optimum Patient Care Research Database (OPCRD)

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

Updated PTSS dataset for the FORAS project

Data from: MIMIC-CXR Database

Data from: RadCases Dataset

Deidentified Horticulture Import Testing Results

VICTORY study - dataset

Data from: MIMIC-CXR-JPG - chest radiographs with structured labels

AIP-OS Dataset: Cognitive Performance in AI-Mediated University Learning

Dataset from Short Period Incidence Study of Severe Acute Respiratory...

Postnatal Affective MRI Dataset

University of Missouri Post-operative Glioma Dataset

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Data Analysis

Usage Notes

Pain Interventions in Dementia - Pain events dataset

Domestic Electrical Load Survey Secure Data 1994-2014 - South Africa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Data appraisal

Community Nutrition Program in El Alto, Bolivia Data: 2014-2017

Data De-Identification or Pseudonymity Software Market Report | Global Forecast From 2025 To 2033

Data De-Identification or Pseudonymity Software Market Outlook

Component Analysis