Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global market for Real World Data (RWD) solutions in medicine is experiencing robust growth, driven by the increasing adoption of digital health technologies, the expanding volume of healthcare data, and a rising demand for evidence-based medical decisions. The market's value is estimated to be $5 billion in 2025, with a Compound Annual Growth Rate (CAGR) of 15% projected from 2025 to 2033. This expansion is fueled by several key factors, including the increasing availability of electronic health records (EHRs), the growing use of wearables and mobile health applications generating patient-generated health data, and the pharmaceutical industry's increased reliance on RWD for drug development and post-market surveillance. Furthermore, regulatory support for the use of RWD in clinical trials and healthcare decision-making is further accelerating market growth. Segmentation reveals strong growth in applications like clinical trials, pharmacoepidemiology, and regulatory submissions, with particularly high demand for advanced analytics and data integration solutions. Despite this positive trajectory, market growth faces certain restraints. These include concerns about data privacy and security, the heterogeneity and quality of RWD from diverse sources, and the lack of standardized data formats and interoperability across healthcare systems. Overcoming these challenges requires investment in robust data governance frameworks, the development of advanced data harmonization techniques, and a collaborative approach among stakeholders, including healthcare providers, technology vendors, and regulatory bodies. The geographic distribution of the market reveals strong growth in North America and Europe, driven by advanced healthcare infrastructure and regulatory frameworks, with emerging markets in Asia Pacific and Latin America showing promising potential for future expansion as healthcare infrastructure improves and the understanding of RWD's utility increases. The market is highly competitive, with a number of established players and emerging technology companies vying for market share.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ARCH Ontology modifiers vs. those in OMOP.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Real-World Evidence (RWE) Solutions market is experiencing robust growth, projected to reach $828.46 million in 2025 and expand at a compound annual growth rate (CAGR) of 13% from 2025 to 2033. This significant expansion is driven by several key factors. The increasing adoption of RWE in regulatory decision-making, fueled by the need for more efficient and cost-effective drug development, is a primary driver. Furthermore, the rising availability of large, diverse datasets from electronic health records (EHRs), claims databases, and wearable devices provides rich sources of real-world data for analysis. Pharmaceutical companies and healthcare providers are actively investing in RWE solutions to improve clinical trial design, enhance post-market surveillance, and optimize treatment strategies, further bolstering market growth. The market is segmented by type (e.g., software, services) and application (e.g., drug development, post-market surveillance), each exhibiting unique growth trajectories influenced by specific technological advancements and regulatory landscapes. Competitive strategies among leading companies, such as Clinigen Group Plc, ICON Plc, and IQVIA Inc., focus on strategic partnerships, technological innovation, and expansion into new geographical markets. These companies are engaged in developing advanced analytical tools and data integration platforms to cater to growing demands for comprehensive RWE solutions. The North American market currently holds a substantial share, driven by robust regulatory frameworks and advanced healthcare infrastructure. However, other regions, particularly Asia Pacific, are expected to witness significant growth in the coming years due to increasing healthcare expenditure and technological advancements. The restraints on market growth are primarily related to data privacy concerns, regulatory hurdles in accessing and utilizing real-world data, and the need for robust data standardization across different sources. However, proactive measures like developing better data security protocols, clarifying regulatory guidelines, and investing in data harmonization initiatives are mitigating these challenges. The future of the RWE Solutions market hinges on continuous technological innovation, particularly in areas like artificial intelligence (AI) and machine learning (ML), which can enhance data analysis and generate valuable insights from complex datasets. Further growth will depend on fostering collaboration among stakeholders, including regulatory bodies, healthcare providers, and technology companies, to create a more conducive environment for RWE adoption.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scientific research for publishing the article
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundClinicians around the world perform clinical research in addition to their high workload. To meet the demands of high quality Investigator Initiated Trials (IITs), Clinical Trial Units (CTUs) (as part of Academic Research Institutions) are implemented worldwide. CTUs increasingly hold a key position in facilitating the international mutual acceptance of clinical research data by promoting clinical research practices and infrastructure according to international standards.AimIn this project, we aimed to identify services that established and internationally operating CTUs – members of the International Clinical Trial Center Network (ICN) – consider most important to ensure the smooth processing of a clinical trial while meeting international standards. We thereby aim to drive international harmonization by providing emerging and growing CTUs with a resource for informed service range set-up.MethodsFollowing the AMEE Guide, we developed a questionnaire, addressing the perceived importance of different CTU services. Survey participants were senior representatives of CTUs and part of the ICN with long-term experience in their field and institution.ResultsServices concerning quality and coordination of a research project were considered to be most essential, i.e., Quality management, Monitoring and Project management, followed by Regulatory & Legal affairs, Education & Training, and Data management. Operative services for conducting a research project, i.e., Study Nurse with patient contact and Study Nurse without patient contact, were considered to be least important.ConclusionTo balance the range of services offered while meeting high international standards of clinical research, emerging CTUs should focus on offering (quality) management services and expertise in regulatory and legal affairs. Additionally, education and training services are required to ensure clinicians are well trained on GCP and legislation. CTUs should evaluate whether the expertise and resources are available to offer operative services.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Codes that could not be found in the OMOP concept dictionary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Harmonization of six quantitative SARS-CoV-2 serological assays using sera of vaccinated subjects. Clinica Chimica Acta. Volume 522, November 2021, Pages 144-151
The purpose of the NINDS Common Data Elements (CDEs) Project is to standardize the collection of investigational data in order to facilitate comparison of results across studies and more effectively aggregate information into significant metadata results. The goal of the National Institute of Neurological Disorders and Stroke (NINDS) CDE Project specifically is to develop data standards for clinical research within the neurological community. Central to this Project is the creation of common definitions and data sets so that information (data) is consistently captured and recorded across studies. To harmonize data collected from clinical studies, the NINDS Office of Clinical Research is spearheading the effort to develop CDEs in neuroscience. This Web site outlines these data standards and provides accompanying tools to help investigators and research teams collect and record standardized clinical data. The Institute still encourages creativity and uniqueness by allowing investigators to independently identify and add their own critical variables. The CDEs have been identified through review of the documentation of numerous studies funded by NINDS, review of the literature and regulatory requirements, and review of other Institute''s common data efforts. Other data standards such as those of the Clinical Data Interchange Standards Consortium (CDISC), the Clinical Data Acquisition Standards Harmonization (CDASH) Initiative, ClinicalTrials.gov, the NINDS Genetics Repository, and the NIH Roadmap efforts have also been followed to ensure that the NINDS CDEs are comprehensive and as compatible as possible with those standards. CDEs now available: * General (CDEs that cross diseases) Updated Feb. 2011! * Congenital Muscular Dystrophy * Epilepsy (Updated Sept 2011) * Friedreich''s Ataxia * Parkinson''s Disease * Spinal Cord Injury * Stroke * Traumatic Brain Injury CDEs in development: * Amyotrophic Lateral Sclerosis (Public review Sept 15 through Nov 15) * Frontotemporal Dementia * Headache * Huntington''s Disease * Multiple Sclerosis * Neuromuscular Diseases ** Adult and pediatric working groups are being finalized and these groups will focus on: Duchenne Muscular Dystrophy, Facioscapulohumeral Muscular Dystrophy, Myasthenia Gravis, Myotonic Dystrophy, and Spinal Muscular Atrophy The following tools are available through this portal: * CDE Catalog - includes the universe of all CDEs. Users are able to search the full universe to isolate a subset of the CDEs (e.g., all stroke-specific CDEs, all pediatric epilepsy CDEs, etc.) and download details about those CDEs. * CRF Library - (a.k.a., Library of Case Report Form Modules and Guidelines) contains all the CRF Modules that have been created through the NINDS CDE Project as well as various guideline documents. Users are able to search the library to find CRF Modules and Guidelines of interest. * Form Builder - enables users to start the process of assembling a CRF or form by allowing them to choose the CDEs they would like to include on the form. This tool is intended to assist data managers and database developers to create data dictionaries for their study forms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Neuroimaging studies often lack reproducibility, one of the cardinal features of the scientific method. Multisite collaboration initiatives increase sample size and limit methodological flexibility, therefore providing the foundation for increased statistical power and generalizable results. However, multisite collaborative initiatives are inherently limited by hardware, software, and pulse and sequence design heterogeneities of both clinical and preclinical MRI scanners and the lack of benchmark for acquisition protocols, data analysis, and data sharing. We present the overarching vision that yielded to the constitution of RIN-Neuroimaging Network, a national consortium dedicated to identifying disease and subject-specific in-vivo neuroimaging biomarkers of diverse neurological and neuropsychiatric conditions. This ambitious goal needs efforts toward increasing the diagnostic and prognostic power of advanced MRI data. To this aim, 23 Italian Scientific Institutes of Hospitalization and Care (IRCCS), with technological and clinical specialization in the neurological and neuroimaging field, have gathered together. Each IRCCS is equipped with high- or ultra-high field MRI scanners (i.e., ≥3T) for clinical or preclinical research or has established expertise in MRI data analysis and infrastructure. The actions of this Network were defined across several work packages (WP). A clinical work package (WP1) defined the guidelines for a minimum standard clinical qualitative MRI assessment for the main neurological diseases. Two neuroimaging technical work packages (WP2 and WP3, for clinical and preclinical scanners) established Standard Operative Procedures for quality controls on phantoms as well as advanced harmonized quantitative MRI protocols for studying the brain of healthy human participants and wild type mice. Under FAIR principles, a web-based e-infrastructure to store and share data across sites was also implemented (WP4). Finally, the RIN translated all these efforts into a large-scale multimodal data collection in patients and animal models with dementia (i.e., case study). The RIN-Neuroimaging Network can maximize the impact of public investments in research and clinical practice acquiring data across institutes and pathologies with high-quality and highly-consistent acquisition protocols, optimizing the analysis pipeline and data sharing procedures.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Fast Healthcare Interoperability Resources (FHIR) has emerged as a robust standard for healthcare data exchange. To explore the use of FHIR for the process of data harmonization, we converted the Medical Information Mart for Intensive Care IV (MIMIC-IV) and MIMIC-IV Emergency Department (MIMIC-IV-ED) databases into FHIR. We extended base FHIR to encode information in MIMIC-IV and aimed to retain the data in FHIR with minimal additional processing, aligning to US Core v4.0.0 where possible. A total of 24 profiles were created for MIMIC-IV data, and an additional 6 profiles were created for MIMIC-IV-ED data. Code systems and value sets were created from MIMIC terminology. We hope MIMIC-IV in FHIR provides a useful restructuring of the data to support applications around data harmonization, interoperability, and other areas of research.
Background: A consensual definition of occupational burnout is currently lacking. We aimed to harmonize the definition of occupational burnout as a health outcome in medical research and to reach a consensus on this definition within the Network on the Coordination and Harmonisation of European Occupational Cohorts (OMEGA-NET). Methods: First, we performed a systematic review in MEDLINE, PsycINFO and EMBASE (January 1990 to August 2018) and a semantic analysis of the available definitions. We used the definitions of burnout and burnout-related concepts from the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) to formulate a consistent harmonized definition of the concept. Second, we sought to obtain consensus on the proposed definition using the Delphi technique. Results: We identified 88 unique definitions of burnout and assigned each of them to one of the 11 original definitions. The semantic analysis yielded a semantic proposal, formulated in accordance with SNOMED-CT as follows: “In a worker, occupational burnout or occupational physical AND emotional exhaustion state is an exhaustion due to prolonged exposure to work-related problems”. A panel of 50 experts (researchers and healthcare professionals with an interest for occupational burnout) reached consensus on this proposal at the second round of the Delphi, with 82% of experts agreeing on it. Conclusion: This study resulted in a harmonized definition of occupational burnout approved by experts from 29 countries within the OMEGA-NET. Future research should address the reproducibility of the Delphi consensus in a larger panel of experts, representing more countries, and examine the practicability of the definition.
International
Number of citations per original and secondary definition of occupational burnout among studies included in the systematic review
Three csv files. The first one (ResearchStrings.csv) presents the literature research strings applied to MEDLINE, EMBASE, and PsychINFO, respectively. The second file (DefinitionsIndexation&Citation_OriginaVsUniqueDef.csv) presents the statements of different definitions of occupational burnout identified within the systematic review, their references and the references of studies citing them. Finally the third file (DefinitionsIndexation&Citation_UniqueDefinitionSummary.csv) presents the correspondence between these “unique” definitions and their “original” definitions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AISSLab Breast Cancer Dataset is a collection of mammogram images by experts from the Ma'amon's Diagnostic Centre Mammogram Images for Breast Cancer (MDCMI-BC) in Yemen. It is designed to support advancements in breast cancer research and computer-aided diagnosis (CAD) systems. To facilitate research in breast cancer detection, focusing on harmonizing AI with diverse imaging data. This dataset emphasizes improving diagnostic accuracy and is available for academic and clinical research applications.
If you are using this dataset for research purpose kindly cite the following papers:
[1] A. M. Al-Hejri, R. M. Al-Tam, M. Fazea, A. H. Sable, S. Lee, and M. A. Al-antari, “ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images,” Diagnostics, vol. 13, no. 1, p. 89, Dec. 2022, doi: 10.3390/diagnostics13010089.
[2] R. M. Al-Tam, A. M. Al-Hejri, S. S. Alshamrani, M. A. Al-antari, and S. M. Narangale, “Multimodal breast cancer hybrid explainable computer-aided diagnosis using medical mammograms and ultrasound Images,” Biocybern. Biomed. Eng., vol. 44, no. 3, pp. 731–758, Jul. 2024, doi: 10.1016/j.bbe.2024.08.007.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global offshoring clinical trials market size is projected to witness significant growth over the forecast period, with an estimated value of USD 35 billion in 2023 and anticipated to reach approximately USD 65 billion by 2032, propelled by a CAGR of 7%. A key growth factor driving this market is the cost-effectiveness and efficiency offered by conducting clinical trials in developing countries. The availability of a large patient pool, coupled with lower operational costs in regions like Asia Pacific and Latin America, is anticipated to contribute substantially to market growth. Moreover, advancements in technology and increased regulatory harmonization are facilitating smoother operations of clinical trials offshore, further enhancing market expansion.
A major growth factor contributing to the expansion of the offshoring clinical trials market is the increasing globalization of pharmaceutical and biotechnology research. Companies are increasingly looking beyond their borders to tap into diverse patient populations and access new markets. This globalization trend is driven by the need for more robust data that can be generated through diverse demographics, potentially expediting drug approval processes. Furthermore, the rapid advancements in digital health technologies and telemedicine are enabling smoother offshoring processes by facilitating remote monitoring and data collection, thereby enhancing efficiency and accuracy of clinical trials.
The rising demand for cost efficiency in drug development is also a pivotal factor in the growth of the offshoring clinical trials market. Clinical trials are notoriously expensive, often comprising a significant portion of a drug's development costs. By offshoring trials to countries where operational costs are lower due to reduced labor and infrastructure costs, pharmaceutical and biotechnology companies can significantly reduce their overall expenditure. This economic incentive is particularly attractive to small and medium-sized enterprises (SMEs) that often operate under tight budget constraints. Moreover, these cost savings can be redirected towards additional research and development efforts, potentially accelerating the drug development cycle.
Moreover, the increasing complexity and stringency of regulatory requirements in developed nations are prompting companies to seek more favorable regulatory environments offshore. Many developing countries are actively working towards improving their regulatory frameworks in line with international standards, making them attractive destinations for clinical trials. The harmonization of regulations across regions offers a dual advantage: easing the administrative burden on companies while ensuring ethical and scientific standards are upheld. This trend is expected to fuel market growth, as more companies embrace the streamlined processes and expedited timelines available in these regions.
The regional outlook of the offshoring clinical trials market suggests that Asia Pacific will continue to be a leading destination for these trials, driven by its substantial patient pool and cost benefits. Latin America is also emerging as a significant player, with countries like Brazil and Mexico offering favorable regulatory environments and a diverse patient demographic. Europe and North America still play a crucial role, particularly in early-phase trials and regulatory oversight. Meanwhile, the Middle East & Africa region is gradually gaining attention due to improving healthcare infrastructure and increasing participation in global research initiatives. This diversification across regions not only spreads risk for companies but also enhances the robustness and relevance of clinical trial data.
Phase I trials, the initial stage of clinical testing, focus on evaluating the safety and dosage of new drugs. Offshoring Phase I trials is primarily driven by the need for rapid recruitment and cost efficiency. Countries in Asia Pacific and Eastern Europe are popular destinations due to their ability to recruit patients swiftly, which is critical in early-phase trials where time is of the essence. The availability of specialized facilities and skilled professionals in these regions further enhances their attractiveness. Additionally, regulatory environments in these areas are becoming increasingly supportive of early-phase trials, aligning with international standards to ensure safety and compliance.
Phase II trials, which assess the efficacy and side effects of a drug, benefit from offshoring due to the diversity of patient po
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This collection introduces an open-source, anthropomorphic phantom-based dataset of CT scans for developing harmonization methods for deep learning based models. The phantom mimics human anatomy, allowing repeated scans without radiation delivery to real patients and isolating scanner effects by removing inter- and intra-patient variations. The dataset includes 268 image series from 13 scanners, 4 manufacturers, and 8 institutions, repeated 18-30 times at a 10 mGy dose using a harmonized protocol. An additional 1,378 image series were acquired with the same 13 scanners and harmonized protocol but including additional acquisition doses. The presented phantom scans consist of three compartments from thorax, liver and test patterns. The 3D-printed liver includes three types of abnormal regions of interest, including two cysts, a metastasis, and a hemangioma, with ground truth segmentation masks that could be used for classification and segmentation.
Recent breakthroughs in data-driven algorithms and artificial intelligence (AI) applications in medical information processing have introduced tremendous potential for AI-assisted image-based personalized medicine that addresses tasks such as segmentation, diagnosis, and prognosis. However, these opportunities come with two challenges: large data requirements and consistency in data distribution. Machine and deep learning algorithms have extreme data demand, which is coupled with the high costs of data acquisition and annotation for a single observation (e.g., one event corresponds to one patient in a survival study). These challenges encourage pooling of data collected from multiple centers and scanners to achieve a critical mass of data for training models. However, pooling data from multiple centers introduces significant variability in the acquisition parameters and specifics of image reconstruction algorithms, leading to data domain shifts and inconsistencies in the collected data. The domain shift introduced by this variability in scanners reduces the value of merging data from multiple centers, reducing performance of predictive tasks such as segmentation, diagnosis, and prognosis, as well as in federated scenarios. Furthermore, domain shifts between training and test or inference data entails high risks of incorrect and uncontrolled predictions for treatment planning and personalized medicine when the inference is based on a scanner (and/or acquisition setting) that was not represented in the training data. Although this challenge applies to all medical imaging modalities, it is particularly important for computed tomography (CT) images due to the wide range of variability in manufacturers, acquisition parameters and dose, reconstruction algorithms, and customized parameter tunings in different centers.
This dataset provides the material to reproduce several different research works conducted in conjunction with it. Researchers can use this dataset for developing their own harmonization methods at both the image and feature levels to tackle the data drift problem from one scanner to another and across different manufacturers. We also release baseline performance metrics for the similarity of scans in the image domain and feature space without harmonization. This will set a baseline to evaluate the effectiveness of various harmonization techniques in the image and feature domains.
The following subsections provide information about how the data were selected, acquired and prepared for publication, approximate date range of imaging studies.
Before the CT scans of the phantom were acquired, a survey was carried out to collect realistic acquisition and reconstruction parameter settings that are used in clinical thoracoabdominal CT scans for oncological staging, tumor search, and infectious focus detection in the portal venous contrast phase. The survey included 21 CT scanners from 9 centers across Switzerland. This translates to a tube voltage of 120 kV, a tube current-time product of 148 mAs, a pitch of 1.000, and a rotation time of 0.5 seconds for the Siemens SOMATOM Definition Edge scanner. The collimation was set to 38.4 mm, with a slice thickness/increment of 2.0 mm, and a pixel spacing of 1.367 mm. Due to vendor-specific limitations, the parameters mentioned above were slightly adapted to the closest possible parameters for each given scanner. The scans were repeated for 13 scanners from 4 manufacturers—Siemens, Philips, General Electric (GE), and Toshiba—at five dose levels (1 mGy, 3 mGy, 6 mGy, 10 mGy, 14 mGy). Only the tube current-time product (in mAs) was adjusted to set the various dose levels; all other parameters were kept the same. For each CT scanner and each dose level, 10 repeated scans (identified in the image series as #1 to #10) with identical settings were performed, except inadvertently for the Toshiba Aquilion Prime SP scanner at 10 mGy (9 repeated scans). Thus, a total of 649 CT scans were performed.
Images were reconstructed using two or three different reconstruction algorithms per CT scan, resulting in two or three CT image series per CT scan. For all CT scans, a vendor-specific iterative reconstruction (IR) algorithm with a standard soft tissue kernel was used, resulting in 649 IR CT series. In addition, filtered backprojection (FBP) reconstruction with a standard soft tissue kernel was used for all CT scans, resulting in another 649 FBP CT series. For 2 of the 13 CT scanners, a DL based reconstruction algorithm was available. For one of these scanners, it was used for three dose levels (1 mGy, 3 mGy, 6 mGy), resulting in 30 additional CT series. For the second scanner, DL reconstruction was used for all five dose levels, resulting in 50 additional CT series. In summary, the dataset presented in this work consists of 1378 series reconstructed from 649 CT scans.
The DICOM data files presented in conjunction with this repository did not undergo any preprocessing steps, in order to preserve all sources of variation—such as spatial shifts and voxel spacing differences introduced by various scanners. However, this repository is linked to a data descriptor paper where we thoroughly analyzed the data, as well as a Git repository that provides the code for resampling the scans to a uniform voxel spacing and performing registration.
The dataset includes original DICOM files with all acquisition parameters stored in the DICOM tags, without any special pre-processing. For each DICOM study, the Study Description tags contain scanner IDs (e.g. "A1" or "H2") which represent 8 institutions. Each DICOM study contains multiple image series reconstructed with different reconstruction methods, plus a series containing the mask related to the various regions of interest in the liver tissue. When downloading these data, the directory and file names will follow the format described in this FAQ entry.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 53.71(USD Billion) |
MARKET SIZE 2024 | 58.44(USD Billion) |
MARKET SIZE 2032 | 114.9(USD Billion) |
SEGMENTS COVERED | Type of Clinical Trial ,Phase of Clinical Development ,Therapeutic Area ,Service Type ,End User ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Growing demand for clinical research Increasing outsourcing of clinical trials Technological advancements Stringent regulatory requirements Rising healthcare costs |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | PRA Health Sciences ,IQVIA ,Pfizer ,Parexel International ,AstraZeneca ,Johnson & Johnson ,Covance ,PPD ,Oracle Health Sciences ,Syneos Health ,Medidata Solutions ,GSK ,ICON plc ,LabCorp ,Merck KGaA |
MARKET FORECAST PERIOD | 2024 - 2032 |
KEY MARKET OPPORTUNITIES | Remote monitoring and data collection Artificial intelligence and machine learning Personalized medicine and precision medicine Regulatory changes and harmonization Data privacy and security |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 8.82% (2024 - 2032) |
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to the latest research, the global Multi-Omics Data Integration SaaS market size reached USD 1.42 billion in 2024, reflecting a robust momentum driven by technological advancements and increasing adoption across life sciences. The market is expected to expand at a CAGR of 17.6% during the forecast period, with projections indicating a value of USD 6.13 billion by 2033. This remarkable growth is primarily fueled by the rising demand for integrated omics solutions in drug discovery, precision medicine, and clinical diagnostics, as organizations seek to leverage data-driven insights for improved outcomes and operational efficiencies.
A key driver behind the expansion of the Multi-Omics Data Integration SaaS market is the surging volume and complexity of biological data generated through next-generation sequencing (NGS) and high-throughput omics technologies. Researchers and clinical practitioners are increasingly reliant on advanced SaaS platforms to unify genomics, proteomics, transcriptomics, and metabolomics data for comprehensive analysis. The integration of these diverse datasets enables a holistic understanding of biological systems, facilitating breakthroughs in disease characterization, biomarker discovery, and therapeutic target identification. As the need for cross-omics data analysis intensifies, SaaS-based solutions offer scalable, flexible, and cost-effective approaches, eliminating the constraints of traditional on-premises infrastructures.
Another significant growth factor is the ongoing digital transformation in healthcare and life sciences, which has accelerated the adoption of cloud-based platforms for data management and analytics. SaaS solutions for multi-omics data integration provide seamless collaboration, secure data sharing, and real-time analytics, empowering interdisciplinary teams to drive innovation at scale. The COVID-19 pandemic further underscored the importance of rapid data integration and remote accessibility, catalyzing investments in digital infrastructure and cloud-native applications. As regulatory frameworks evolve to support data privacy and interoperability, organizations are increasingly confident in leveraging SaaS platforms for sensitive multi-omics research and clinical workflows.
The emergence of artificial intelligence (AI) and machine learning (ML) technologies is also transforming the Multi-Omics Data Integration SaaS market. By harnessing advanced algorithms, SaaS platforms can automate complex data integration, normalization, and interpretation tasks, uncovering hidden patterns and actionable insights from vast multi-omics datasets. This capability is particularly valuable in precision medicine, where individualized patient profiles require sophisticated analytics to inform diagnosis, prognosis, and treatment selection. As AI-powered multi-omics platforms become more accessible and user-friendly, their adoption is expected to proliferate across academic, clinical, and commercial settings, further propelling market growth.
From a regional perspective, North America currently dominates the global Multi-Omics Data Integration SaaS market, accounting for the largest revenue share in 2024. This leadership is attributed to the region’s advanced healthcare infrastructure, significant R&D investments, and a strong presence of leading SaaS providers. Europe and Asia Pacific are also experiencing rapid growth, driven by expanding genomics research initiatives, government funding, and increasing collaborations between academic institutions and industry stakeholders. As emerging markets in Latin America and the Middle East & Africa invest in digital health infrastructure, the global footprint of multi-omics SaaS solutions is expected to broaden, fostering greater accessibility and innovation worldwide.
The Component segment of the Multi-Omics Data Integration SaaS market is bifurcated into software and services, each playing a pivotal role in enabling seamless integration and analysis of multi-omics datasets. Software solutions form the backbone of this segment, offering robust platforms for data ingestion, harmonization, visualization, and advanced analytics. These solutions are designed to handle the complexity and heterogeneity of omics data, providing researchers with intuitive interfaces and customizable workflows. The increasing sophistication of analytical tools, including AI-
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data and scripts for our research paper:
"Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare"
The paper presents a generic LLM-based pipeline to enable data harmonization across distributed data sources. Collaboration with MPRINT project contributed to the motivation articulated in manuscript’s Section III (Aligning Biomedical Data Via Ontologies), as well as data for experiments and evaluation presented in Section IV (Data Alignment For Drug Reporting Use Case) of the paper.
On a broader footing, the overall pipeline becomes a function within the Federated Learning (FL) Brane/EPI framework (discussed in Section II). Such FL frameworks are deployed within the firewall of a health organization to convert data from their source format, to a target format expected by researchers designing federated studies, such that data privacy and integrity are not affected as the data per se is not copied.
In the MPRINT scenario presented in the paper, the goal was to map to the target Mondo Disease Ontology (MONDO) or Human Phenotype Ontology (HPO) format from source data that were either (a) not annotated, i.e., outcomes were given in plain English, or (b) annotated with an unrelated ontology, International Classification of Diseases, Version 10 (ICD-10). Here, we evaluate the performance of an LLM-based mapping pipeline to bridge source and target formats against a human operator.
- scripts
- data
- input
- hp.json - Human Phenotype Ontology (HPO) (https://obofoundry.org/ontology/hp.html)
- mondo.json - Mondo Disease Ontology (https://mondo.monarchinitiative.org/pages/download/)
- snomed.txt - SNOMED CT content from snomed/Full/Refset/Map/der2_iisssccRefset_ExtendedMapFull_US1000124_20240901.txt"
(https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html
- MPRINT_MarketScan_Phenotypes.csv - Input ICD10 codes from the MPRINT dataset
- output (files produced by methods in gen_snomed_mondo_hpo.py)
- full_icd10_to_snomed.csv - Extracted ICD-10 to SNOMED code mapping
- filtered_icd10_to_snomed.tsv - Subset of ICD-10 to SNOMED mapping relevant for our project
- snomed_to_mondo.csv - SNOMED to MONDO code mapping from mondo.json
- snomed_to_hpo.csv" - SNOMED to HPO code mapping from hp.json
- icd10_to_mondo_hpo_snomed.tsv" - ICD-10 to MONDO/HPO mapping for input ICD-10 subset
- icd_to_mondo_hpo_rag_llm.tsv - LLM acceptance of ICD-10 to MONDO/HPO candidate pairs, full set
- eval
- subset-eval_icd10_to_mondo_hpo_rag_llm-vs-human.tsv - LLM vs human acceptance evaluation for a subset of candidate pairs produced by RAG
- subset-eval_icd10_to_mondo_hpo_snomed_llm-vs-human.tsv - LLM vs human acceptance evaluation for a subset of candidate pairs produced via SNOMED mapping
- gen_llm_rag_mondo_hpo.py - script to generate candidate matching pairs by vector similarity search (RAG))
- gen_snomed_mondo_hpo.py - script to generate candidate matching pairs via SNOMED mapping
- eval_llm.py - script to evaluate candidate matching pairs
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Several large-scale, pragmatic clinical trials on opioid use disorder (OUD) have been completed in the National Drug Abuse Treatment Clinical Trials Network (CTN). However, the resulting data have not been harmonized between the studies to compare the patient characteristics. This paper provides lessons learned from a large-scale harmonization process that are critical for all biomedical researchers collecting new data and those tasked with combining datasets. We harmonized data from multiple domains from CTN-0027 (N = 1269), which compared methadone and buprenorphine at federally licensed methadone treatment programs; CTN-0030 (N = 653), which recruited patients who used predominantly prescription opioids and were treated with buprenorphine; and CTN-0051 (N = 570), which compared buprenorphine and extended-release naltrexone (XR-NTX) and recruited from inpatient treatment facilities. Patient-level data were harmonized and a total of 23 database tables, with meticulous documentation, covering more than 110 variables, along with three tables with “meta-data” about the study design and treatment arms, were created. Domains included: social and demographic characteristics, medical and psychiatric history, self-reported drug use details and urine drug screening results, withdrawal, and treatment drug details. Here, we summarize the numerous issues with the organization and fidelity of the publicly available data which were noted and resolved, and present results on patient characteristics across the three trials and the harmonized domains, respectively. A systematic harmonization of OUD clinical trial data can be accomplished, despite heterogeneous data coding and classification procedures, by standardizing commonly assessed characteristics. Similar methods, embracing database normalization and/or “tidy” data, should be used for future datasets in other substance use disorder clinical trials.
Portal to make cancer related proteomic datasets easily accessible to public. Facilitates multiomic integration in support of precision medicine through interoperability with other resources. Developed to advance our understanding of how proteins help to shape risk, diagnosis, development, progression, and treatment of cancer. One of several repositories within NCI Cancer Research Data Commons which enables researchers to link proteomic data with other data sets (e.g., genomic and imaging data) and to submit, collect, analyze, store, and share data throughout cancer data ecosystem. PDC provides access to highly curated and standardized biospecimen, clinical, and proteomic data, intuitive interface to filter, query, search, visualize and download data and metadata. Provides common data harmonization pipeline to uniformly analyze all PDC data and provides advanced visualization of quantitative information. Cloud based (Amazon Web Services) infrastructure facilitates interoperability with AWS based data analysis tools and platforms natively. Application programming interface (API) provides cloud-agnostic data access and allows third parties to extend functionality beyond PDC. Structured workspace that serves as private user data store and also data submission portal. Distributes controlled access data, such as patient-specific protein fasta sequence databases, with dbGaP authorization and eRA Commons authentication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.