Facebook
TwitterThe purest type of electronic clinical data which is obtained at the point of care at a medical facility, hospital, clinic or practice. Often referred to as the electronic medical record (EMR), the EMR is generally not available to outside researchers. The data collected includes administrative and demographic information, diagnosis, treatment, prescription drugs, laboratory tests, physiologic monitoring data, hospitalization, patient insurance, etc.
Individual organizations such as hospitals or health systems may provide access to internal staff. Larger collaborations, such as the NIH Collaboratory Distributed Research Network provides mediated or collaborative access to clinical data repositories by eligible researchers. Additionally, the UW De-identified Clinical Data Repository (DCDR) and the Stanford Center for Clinical Informatics allow for initial cohort identification.
About Dataset:
333 scholarly articles cite this dataset.
Unique identifier: DOI
Dataset updated: 2023
Authors: Haoyang Mi
In this dataset, we have two dataset:
1- Clinical Data_Discovery_Cohort: Name of columns: Patient ID Specimen date Dead or Alive Date of Death Date of last Follow Sex Race Stage Event Time
2- Clinical_Data_Validation_Cohort Name of columns: Patient ID Survival time (days) Event Tumor size Grade Stage Age Sex Cigarette Pack per year Type Adjuvant Batch EGFR KRAS
Feel free to put your thought and analysis in a notebook for this datasets. And you can create some interesting and valuable ML projects for this case. Thanks for your attention.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🧪 Covid-19 Clinical Trials Dataset (Raw + Cleaned)
This dataset offers a deep look into the global clinical research landscape during the Covid-19 pandemic. Sourced directly from ClinicalTrials.gov, it provides structured and semi-structured information on registered Covid-19-related clinical trials across countries, sponsors, and phases.
📁 What’s Included • COVID_clinical_trials.csv — Raw dataset as obtained from ClinicalTrials.gov • Covid-19_cleaned_dataset.csv — Preprocessed version for direct use in data analysis and visualization tasks
🎯 Use Case & Learning Goals
This dataset is ideal for: • Practicing data cleaning, preprocessing, and wrangling • Performing exploratory data analysis (EDA) • Building interactive dashboards (e.g., with Tableau or Plotly) • Training ML models for classification or forecasting (e.g., predicting trial outcomes) • Exploring trends in clinical trial research during global health emergencies
🔍 Key Features
Each row represents a registered clinical trial and includes fields such as: • NCT Number (unique ID) • Study Title • Start Date and Completion Date • Phase • Study Type (Interventional/Observational) • Enrollment Size • Country, Sponsor, and Intervention Type • Study Status (Recruiting, Completed, Withdrawn, etc.)
✅ Cleaned Dataset
The cleaned version includes: • Standardized column naming • Filled missing values where possible • Removed duplicates and a few columns
📊 Example Applications • Country-wise contribution analysis • Sponsor landscape visualization • Trial timeline and phase progression charts • Predictive modeling of trial duration or status
🙏 Acknowledgments
Thanks to ClinicalTrials.gov for providing public access to this critical data.
Facebook
Twitterclinicaltrials.gov_searchThis is complete original dataset.identify completed trialsThis is the R script which when run on "clinicaltrials.gov_search.txt" will produce a .csv file which lists all the completed trials.FDA_table_with_sensThis is the final dataset after cross referencing the trials. An explanation of the variables is included in the supplementary file "2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in the dataset".analysis_after_FDA_categorization_and_sensThis R script reproduces the analysis from the paper, including the tables and statistical tests. The comments should make it self explanatory.2011-11-02 prayle hurley smyth supplementary file 1 STROBE checklistThis is a STROBE checklist for the study2011-10-31 Prayle Hurley Smyth Supplementary file 2 examples of categorizationThis is a supplementary file which illustrates some of the decisions which had to be made when categorizing trials.2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in th...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The free database mapping COVID-19 treatment and vaccine development based on the global scientific research is available at https://covid19-help.org/.
Files provided here are curated partial data exports in the form of .csv files or full data export as .sql script generated with pg_dump from our PostgreSQL 12 database. You can also find .png file with our ER diagram of tables in .sql file in this repository.
Structure of CSV files
*On our site, compounds are named as substances
compounds.csv
Id - Unique identifier in our database (unsigned integer)
Name - Name of the Substance/Compound (string)
Marketed name - The marketed name of the Substance/Compound (string)
Synonyms - Known synonyms (string)
Description - Description (HTML code)
Dietary sources - Dietary sources where the Substance/Compound can be found (string)
Dietary sources URL - Dietary sources URL (string)
Formula - Compound formula (HTML code)
Structure image URL - Url to our website with the structure image (string)
Status - Status of approval (string)
Therapeutic approach - Approach in which Substance/Compound works (string)
Drug status - Availability of Substance/Compound (string)
Additional data - Additional data in stringified JSON format with data as prescribing information and note (string)
General information - General information about Substance/Compound (HTML code)
references.csv
Id - Unique identifier in our database (unsigned integer)
Impact factor - Impact factor of the scientific article (string)
Source title - Title of the scientific article (string)
Source URL - URL link of the scientific article (string)
Tested on species - What testing model was used for the study (string)
Published at - Date of publication of the scientific article (Date in ISO 8601 format)
clinical-trials.csv
Id - Unique identifier in our database (unsigned integer)
Title - Title of the clinical trial study (string)
Acronym title - Acronym of title of the clinical trial study (string)
Source id - Unique identifier in the source database
Source id optional - Optional identifier in other databases (string)
Interventions - Description of interventions (string)
Study type - Type of the conducted study (string)
Study results - Has results? (string)
Phase - Current phase of the clinical trial (string)
Url - URL to clinical trial study page on clinicaltrials.gov (string)
Status - Status in which study currently is (string)
Start date - Date at which study was started (Date in ISO 8601 format)
Completion date - Date at which study was completed (Date in ISO 8601 format)
Additional data - Additional data in the form of stringified JSON with data as locations of study, study design, enrollment, age, outcome measures (string)
compound-reference-relations.csv
Reference id - Id of a reference in our DB (unsigned integer)
Compound id - Id of a substance in our DB (unsigned integer)
Note - Id of a substance in our DB (unsigned integer)
Is supporting - Is evidence supporting or contradictory (Boolean, true if supporting)
compound-clinical-trial.csv
Clinical trial id - Id of a clinical trial in our DB (unsigned integer)
Compound id - Id of a Substance/Compound in our DB (unsigned integer)
tags.csv
Id - Unique identifier in our database (unsigned integer)
Name - Name of the tag (string)
tags-entities.csv
Tag id - Id of a tag in our DB (unsigned integer)
Reference id - Id of a reference in our DB (unsigned integer)
API Specification
Our project also has an Open API that gives you access to our data in a format suitable for processing, particularly in JSON format.
https://covid19-help.org/api-specification
Services are split into five endpoints:
Substances - /api/substances
References - /api/references
Substance-reference relations - /api/substance-reference-relations
Clinical trials - /api/clinical-trials
Clinical trials-substances relations - /api/clinical-trials-substances
Method of providing data
All dates are text strings formatted in compliance with ISO 8601 as YYYY-MM-DD
If the syntax request is incorrect (missing or incorrectly formatted parameters) an HTTP 400 Bad Request response will be returned. The body of the response may include an explanation.
Data updated_at (used for querying changed-from) refers only to a particular entity and not its logical relations. Example: If a new substance reference relation is added, but the substance detail has not changed, this is reflected in the substance reference relation endpoint where a new entity with id and current dates in created_at and updated_at fields will be added, but in substances or references endpoint nothing has changed.
The recommended way of sequential download
During the first download, it is possible to obtain all data by entering an old enough date in the parameter value changed-from, for example: changed-from=2020-01-01 It is important to write down the date on which the receiving the data was initiated let’s say 2020-10-20
For repeated data downloads, it is sufficient to receive only the records in which something has changed. It can therefore be requested with the parameter changed-from=2020-10-20 (example from the previous bullet). Again, it is important to write down the date when the updates were downloaded (eg. 2020-10-20). This date will be used in the next update (refresh) of the data.
Services for entities
List of endpoint URLs:
Format of the request
All endpoints have these parameters in common:
changed-from - a parameter to return only the entities that have been modified on a given date or later.
continue-after-id - a parameter to return only the entities that have a larger ID than specified in the parameter.
limit - a parameter to return only the number of records specified (up to 1000). The preset number is 100.
Request example:
/api/references?changed-from=2020-01-01&continue-after-id=1&limit=100
Format of the response
The response format is the same for all endpoints.
number_of_remaining_ids - the number of remaining entities that meet the specified criteria but are not displayed on the page. An integer of virtually unlimited size.
entities - an array of entity details in JSON format.
Response example:
{
"number_of_remaining_ids" : 100,
"entities" : [
{
"id": 3,
"url": "https://www.ncbi.nlm.nih.gov/pubmed/32147628",
"title": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",
"impact_factor": "Discovering drugs to treat coronavirus disease 2019 (COVID-19).",
"tested_on_species": "in silico",
"publication_date": "2020-22-02",
"created_at": "2020-30-03",
"updated_at": "2020-31-03",
"deleted_at": null
},
{
"id": 4,
"url": "https://www.ncbi.nlm.nih.gov/pubmed/32157862",
"title": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",
"impact_factor": "CT Manifestations of Novel Coronavirus Pneumonia: A Case Report",
"tested_on_species": "Patient",
"publication_date": "2020-06-03",
"created_at": "2020-30-03",
"updated_at": "2020-30-03",
"deleted_at": null
},
]
}
Endpoint details
Substances
URL: /api/substances
Substances
Facebook
Twitter
According to our latest research, the global Clinical Trial Sample Management Software market size reached USD 1.32 billion in 2024, reflecting a robust growth trajectory driven by the rising complexity and scale of clinical trials worldwide. The market is expected to advance at a CAGR of 11.6% during the forecast period, with projections indicating the market will reach USD 3.24 billion by 2033. This sustained growth is fueled by increasing regulatory requirements, the need for efficient sample tracking, and the digital transformation of clinical trial operations across pharmaceutical, biotechnology, and healthcare organizations.
One of the primary growth factors in the Clinical Trial Sample Management Software market is the surging volume and complexity of clinical trials, particularly in the wake of the global pandemic and the rapid evolution of personalized medicine. Pharmaceutical and biotechnology companies are conducting more multi-site and multi-phase clinical trials than ever before, necessitating advanced software solutions to manage the lifecycle of biological samples efficiently. These platforms not only streamline sample tracking and storage but also ensure compliance with stringent regulatory standards such as GxP, HIPAA, and GDPR. As the demand for precision medicine and biologics increases, the requirement for robust sample management solutions becomes indispensable, further driving market growth.
Another significant driver is the ongoing digitalization of clinical trial processes. The integration of advanced technologies such as artificial intelligence, blockchain, and cloud computing into sample management software has revolutionized the way clinical samples are handled. These innovations enable real-time data access, enhanced security, and seamless collaboration among stakeholders, from research organizations to contract research organizations (CROs). The adoption of cloud-based solutions, in particular, is accelerating due to their scalability, cost-effectiveness, and ability to support remote operations. This shift is especially crucial as clinical trials become more globalized, involving multiple sites and cross-border collaborations.
Furthermore, the growing emphasis on data integrity and traceability is propelling the adoption of Clinical Trial Sample Management Software. Regulatory authorities are mandating rigorous documentation and audit trails for every sample used in clinical research. Modern sample management platforms offer comprehensive tracking capabilities, automated documentation, and integration with laboratory information management systems (LIMS), ensuring end-to-end traceability. This not only mitigates the risk of sample mismanagement but also enhances the reliability and reproducibility of clinical trial outcomes, which are critical for regulatory submissions and market approvals.
From a regional perspective, North America continues to dominate the Clinical Trial Sample Management Software market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of major pharmaceutical companies, a highly developed healthcare infrastructure, and a favorable regulatory environment. Europe follows closely, driven by increased R&D investments and stringent data protection regulations. Meanwhile, the Asia Pacific region is emerging as the fastest-growing market, fueled by expanding clinical research activities, government initiatives to promote clinical trials, and the rapid adoption of digital health technologies. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as awareness and investment in clinical research infrastructure increase.
The Component segment in the Clinical Trial Sample Management Software market is bifurcated into software and services, each playing a critical role in the ecosystem. The software segment holds the largest share, underpinned by the increasing demand for comprehensive platfor
Facebook
TwitterBackground Acceptability curves have been proposed for quantifying the probability that a treatment under investigation in a clinical trial is cost-effective. Various definitions and estimation methods have been proposed. Loosely speaking, all the definitions, Bayesian or otherwise, relate to the probability that the treatment under consideration is cost-effective as a function of the value placed on a unit of effectiveness. These definitions are, in fact, expressions of the certainty with which the current evidence would lead us to believe that the treatment under consideration is cost-effective, and are dependent on the amount of evidence (i.e. sample size). Methods An alternative for quantifying the probability that the treatment under consideration is cost-effective, which is independent of sample size, is proposed. Results Non-parametric methods are given for point and interval estimation. In addition, these methods provide a non-parametric estimator and confidence interval for the incremental cost-effectiveness ratio. An example is provided. Conclusions The proposed parameter for quantifying the probability that a new therapy is cost-effective is superior to the acceptability curve because it is not sample size dependent and because it can be interpreted as the proportion of patients who would benefit if given the new therapy. Non-parametric methods are used to estimate the parameter and its variance, providing the appropriate confidence intervals and test of hypothesis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this project, we work on repairing three datasets:
country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients. N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global clinical trial sample management software market size reached USD 1.12 billion in 2024, reflecting a robust expansion in the adoption of digital solutions for clinical research. The market is projected to grow at a CAGR of 9.4% from 2025 to 2033, reaching an estimated USD 2.56 billion by 2033. This growth is primarily driven by the increasing complexity of clinical trials, stringent regulatory requirements, and the demand for enhanced data integrity and traceability in sample management across the pharmaceutical, biotechnology, and healthcare sectors.
A significant growth factor for the clinical trial sample management software market is the rising number of clinical trials globally, spurred by the rapid development of novel therapeutics and vaccines. Pharmaceutical and biotechnology companies are increasingly investing in research and development, leading to a surge in the volume and diversity of biological samples that require meticulous management. Efficient sample tracking, storage, and retrieval have become critical for ensuring the reliability and reproducibility of clinical trial results. This has prompted organizations to shift from manual or semi-automated processes to sophisticated software platforms that offer real-time monitoring, audit trails, and regulatory compliance, thus fueling market growth.
Another pivotal driver is the stringent regulatory landscape governing clinical trials, especially in regions like North America and Europe. Regulatory bodies such as the FDA, EMA, and others have established rigorous guidelines for sample handling, data integrity, and patient safety. Clinical trial sample management software solutions are designed to ensure adherence to these regulations by providing automated documentation, secure data storage, and comprehensive reporting features. The increasing emphasis on Good Clinical Practice (GCP) and the need to minimize errors or discrepancies in sample management further accelerate the adoption of these digital solutions, as organizations seek to avoid costly delays, penalties, or trial failures.
Technological advancements are also propelling the market forward. The integration of artificial intelligence, machine learning, and cloud computing into clinical trial sample management software has revolutionized the way samples are tracked, analyzed, and reported. Features such as automated labeling, temperature monitoring, and predictive analytics enhance operational efficiency, reduce human error, and optimize resource utilization. Additionally, the growing trend of decentralized and virtual clinical trials, especially post-pandemic, necessitates robust digital infrastructure for remote sample management, further boosting the demand for advanced software solutions in this domain.
From a regional perspective, North America continues to dominate the clinical trial sample management software market, owing to its well-established pharmaceutical and biotechnology industries, significant R&D investments, and favorable regulatory environment. However, the Asia Pacific region is emerging as a lucrative market, driven by increasing clinical trial activities, expanding healthcare infrastructure, and government initiatives to promote research. Europe also maintains a strong presence, supported by its focus on innovation and compliance. Latin America and the Middle East & Africa are gradually catching up, albeit at a slower pace, as awareness and adoption of digital solutions in clinical research gain momentum.
The clinical trial sample management software market by component is segmented into software and services. The software segment holds the largest share, attributed to the growing need for robust, scalable, and user-friendly platforms that can manage complex workflows, large datasets, and multi-site trial operations. Modern software solutions offer end-to-end sample lifecycle management, from collection and labeling to storage, shipping, and disposal, ensuring compliance with regulatory standards. The integration of advanced functionalities such as real-time tracking, automated audit trails, and customizable dashboards further enhances the value proposition for end-users, driving the widespread adoption of software-based solutions across the clinical research landscape.
The
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In clinical trials, the choice of an adequate primary endpoint is often difficult. Besides its clinical relevance, the endpoint must be measurable within reasonable time and must allow differentiating between the treatments. Often, the most relevant endpoint is ‘’time-to-death,” but if the overall survival prognosis is good, only a few deaths are observed during the study duration. A possible solution is to use surrogate endpoints instead. However, various examples from the literature demonstrate that surrogates do not always perform as intended. Sometimes, the surrogate effect is smaller than for the original endpoint, or the latter shows a higher effect than anticipated so using the surrogate is not reasonable. In this work, different adaptive design strategies for two candidate endpoints are proposed to solve these problems. The idea is to base the efficacy proof on the significance of at least one endpoint. At an interim analysis, both candidates are evaluated. If it is not possible to stop the study early, the sample size is recalculated based on the more promising endpoint. The new methods are illustrated by a clinical study example and compared in terms of power and sample size using Monte Carlo simulations. The software code is provided as supplementary material.
Facebook
TwitterCollection of samples and data across the following diseases: Multiple myeloma (disorder).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
SAE sample data (CSV)
Facebook
Twitter
According to our latest research, the global clinical trial management systems (CTMS) market size stood at USD 1.82 billion in 2024, reflecting the rapidly increasing adoption of digital solutions in clinical research. The market is poised for robust expansion, projected to reach USD 4.59 billion by 2033, growing at a healthy CAGR of 10.8% from 2025 to 2033. This impressive growth is primarily driven by the rising complexity of clinical trials, stringent regulatory requirements, and the need for efficient management of clinical trial data and operations.
One of the most significant growth factors propelling the clinical trial management systems market is the escalating volume and complexity of clinical trials worldwide. Pharmaceutical, biotechnology, and medical device companies are increasingly investing in research and development to bring innovative therapies and products to market. This surge in clinical activity necessitates robust CTMS solutions to streamline planning, tracking, and management of trials, ensuring compliance with regulatory standards and optimizing resource allocation. Additionally, the growing prevalence of chronic diseases and the demand for personalized medicine are leading to more complex trial designs, multi-site studies, and larger participant pools, all of which require advanced CTMS functionalities to maintain data integrity, patient safety, and trial efficiency.
Another key driver is the technological advancement in software solutions, particularly the integration of artificial intelligence (AI), machine learning, and advanced analytics into CTMS platforms. These technologies enable real-time monitoring, predictive analytics, and automated workflows, significantly reducing manual errors and administrative burdens. The shift towards cloud-based and web-based deployment models has further enhanced accessibility, scalability, and cost-effectiveness, making CTMS solutions more attractive to small and mid-sized organizations. Moreover, the increasing use of electronic data capture (EDC) and remote monitoring tools, especially in the wake of the COVID-19 pandemic, has accelerated digital transformation across the clinical research landscape, fueling further market growth.
Regulatory pressures and the need for stringent data security and compliance are also shaping the clinical trial management systems market. Global regulatory agencies, such as the FDA, EMA, and ICH, require comprehensive documentation, real-time reporting, and audit trails for all clinical trials. CTMS solutions play a pivotal role in ensuring that organizations can meet these regulatory demands efficiently and cost-effectively. The growing trend of outsourcing clinical trials to contract research organizations (CROs) and the increasing collaboration between sponsors and research sites have further intensified the demand for centralized, interoperable CTMS platforms that can facilitate seamless data exchange, communication, and oversight across stakeholders.
In this evolving landscape, Clinical Trial Sample Management Software has emerged as a crucial component in enhancing the efficiency and accuracy of clinical trials. This software facilitates the seamless management of biological samples, ensuring that they are correctly labeled, stored, and tracked throughout the trial process. By automating sample handling and integrating with existing CTMS platforms, this software reduces the risk of human error and enhances data integrity. As trials become more complex and geographically dispersed, the ability to manage samples efficiently and in compliance with regulatory standards becomes increasingly vital. The integration of such software not only optimizes resource allocation but also accelerates the overall trial timeline, contributing to faster delivery of new therapies to the market.
From a regional perspective, North America continues to dominate the CTMS market, driven by a highly developed healthcare infrastructure, significant R&D investments, and the presence of leading pharmaceutical and biotechnology companies. However, Asia Pacific is emerging as the fastest-growing region, fueled by increasing clinical trial activity, expanding healthcare expenditure, and favorable regulatory reforms. Europe also holds a substantial market share, supported by strong government initiatives, robust clinical research
Facebook
TwitterObjective: Evidence synthesis teams, physicians, policy makers, and patients and their families all have an interest in following the outcomes of clinical trials and would benefit from being able to evaluate both the results posted in trial registries and in the publications that arise from them. Manual searching for publications arising from a given trial is a laborious and uncertain process. We sought to create a statistical model to automatically identify PubMed articles likely to report clinical outcome results from each registered trial in ClinicalTrials.gov.
Materials and Methods: A machine learning-based model was trained on pairs (publications linked to specific registered trials). Multiple features were constructed based on the degree of matching between the PubMed article metadata and specific fields of the trial registry, as well as matching with the set of publications already known to be linked to that trial.
Results: Evaluation of the model using NCT-linked articles as g...
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Facebook
TwitterBackgroundPatient reported outcomes (PROs) are increasingly assessed in clinical trials, and guidelines are available to inform the design and reporting of such trials. However, researchers involved in PRO data collection report that specific guidance on ‘in-trial’ activity (recruitment, data collection and data inputting) and the management of ‘concerning’ PRO data (i.e., data which raises concern for the well-being of the trial participant) appears to be lacking. The purpose of this review was to determine the extent and nature of published guidelines addressing these areas. Methods and FindingsSystematic review of 1,362 articles identified 18 eligible papers containing ‘in-trial’ guidelines. Two independent authors undertook a qualitative content analysis of the selected papers. Guidelines presented in each of the articles were coded according to an a priori defined coding frame, which demonstrated reliability (pooled Kappa 0.86–0.97), and validity (<2% residual category coding). The majority of guidelines present were concerned with ‘pre-trial’ activities (72%), for example, outcome measure selection and study design issues, or ‘post-trial’ activities (16%) such as data analysis, reporting and interpretation. ‘In-trial’ guidelines represented 9.2% of all guidance across the papers reviewed, with content primarily focused on compliance, quality control, proxy assessment and reporting of data collection. There were no guidelines surrounding the management of concerning PRO data. ConclusionsThe findings highlight there are minimal in-trial guidelines in publication regarding PRO data collection and management in clinical trials. No guidance appears to exist for researchers involved with the handling of concerning PRO data. Guidelines are needed, which support researchers to manage all PRO data appropriately and which facilitate unbiased data collection.
Facebook
Twitterhttps://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global Virtual Clinical Trials market size will be USD 9860 million in 2025. It will expand at a compound annual growth rate (CAGR) of 6.20% from 2025 to 2033.
North America held the major market share for more than 40% of the global revenue with a market size of USD 3648.20 million in 2025 and will grow at a compound annual growth rate (CAGR) of 4.7% from 2025 to 2033.
Europe accounted for a market share of over 30% of the global revenue with a market size of USD 2859.40 million.
APAC held a market share of around 23% of the global revenue with a market size of USD 2366.40 million in 2025 and will grow at a compound annual growth rate (CAGR) of 8.9% from 2025 to 2033.
South America has a market share of more than 5% of the global revenue with a market size of USD 374.68 million in 2025 and will grow at a compound annual growth rate (CAGR) of 6.9% from 2025 to 2033.
Middle East had a market share of around 2% of the global revenue and was estimated at a market size of USD 394.40 million in 2025 and will grow at a compound annual growth rate (CAGR) of 7.5% from 2025 to 2033.
Africa had a market share of around 1% of the global revenue and was estimated at a market size of USD 216.92 million in 2025 and will grow at a compound annual growth rate (CAGR) of 6.5% from 2025 to 2033.
Interventional category is the fastest growing segment of the Virtual Clinical Trials industry
Market Dynamics of Virtual Clinical Trials Market
Key Drivers for Virtual Clinical Trials Market
Technology's Expanding Application in Healthcare to Boost Market Growth
The current approach to conducting a traditional clinical trial is coming to an end because of the expensive mix of growing trial expenses and the potential for commercial failure. As we get closer to a more digitalised environment, the use of advanced technology is increasingly driving clinical investigations and rapidly accelerating the digital transformation of their operations. Clinical studies are necessary. Thus a solution to the problems with traditional clinical trials is to move to cutting-edge technology and digital innovation. Once only concepts, digital clinical trials are now on the verge of becoming a reality. Digital technologies are transforming the entire process of developing new drugs. Artificial intelligence, cloud computing, wearable and mobile technology, and associated platforms have all advanced, making it feasible to gather frequent, accurate, and multidimensional data throughout trials. For instance, in August 2022, a new software program introduced by Medable Inc. helped to streamline virtual/decentralized clinical trials for vaccinations. This program was expected to reduce deployment time by 50% and provide access to clinical trials worldwide.
Supportive Government Initiatives To Boost Market Growth
The FDA has taken an open approach to new ideas and developments in technology. According to the FDA, there are benefits to appropriately utilising the technology in clinical research. One of the earliest papers on electronic informed consent for medical research was this one. A number of non-regulatory initiatives, with the assistance of regulators, are developing guidelines for the use of virtual tools in trial design. For example, standards for virtual clinical trials using telemedicine and mobile healthcare providers have been established by the FDA-sponsored Clinical Trial Transformation Initiative. By understanding the confluence of people, information, technology, and connections to improve healthcare and health outcomes, the FDA is more progressive than sponsors and drug development organisations with conservative views.
Restraint Factor for the Virtual Clinical Trials Market
Regulatory Hurdles Will Limit Market Growth
The stringent laws and regulations are one of the main things that are anticipated to impede the expansion of the worldwide market for virtual clinical trials. The Food and Drug Administration (FDA), for example, regulates clinical trials in the United States to make sure they are planned, carried out, evaluated, and reported in compliance with federal law and good clinical practice guidelines. However, maintaining IT infrastructure for clinical trial coordination and advanci...
Facebook
TwitterAdditional file 1.
Facebook
Twitter
According to our latest research, the global ambient monitoring for clinical trial kits market size reached USD 1.24 billion in 2024, with robust growth driven by the increasing demand for precise environmental monitoring solutions in clinical research. The market is expanding at a CAGR of 8.7% and is forecasted to reach USD 2.62 billion by 2033. This remarkable growth is primarily fueled by stringent regulatory requirements for clinical trials, the rising complexity of clinical protocols, and the growing prevalence of decentralized and remote clinical trial models.
One of the primary growth factors for the ambient monitoring for clinical trial kits market is the escalating emphasis on regulatory compliance and data integrity in clinical research. Regulatory bodies such as the FDA, EMA, and other international agencies have heightened their scrutiny on the storage and transport conditions of clinical trial materials. This has necessitated the adoption of advanced ambient monitoring solutions, including sensors, data loggers, and specialized software, to ensure that temperature, humidity, and other environmental parameters are consistently maintained within prescribed limits. Pharmaceutical and biotechnology companies are increasingly investing in these technologies to minimize the risk of compromised samples, which could jeopardize trial outcomes and lead to costly delays or failures. As a result, the market is witnessing a surge in demand for reliable and real-time monitoring solutions that can seamlessly integrate with clinical trial logistics and data management systems.
Another significant growth driver is the ongoing shift towards decentralized and virtual clinical trials, which has been accelerated by the COVID-19 pandemic and the subsequent need for remote trial management. Decentralized trials often involve the distribution of clinical trial kits to multiple locations, including patients’ homes, making ambient monitoring even more critical to ensure the integrity of biological samples, investigational products, and diagnostic materials. The adoption of Internet of Things (IoT)-enabled sensors, wireless data loggers, and cloud-based monitoring platforms has enabled sponsors and contract research organizations (CROs) to remotely track environmental conditions in real time, thereby reducing the risk of data loss or sample degradation. This technological evolution is expected to further propel market growth as stakeholders seek to enhance operational efficiency and patient-centricity in clinical research.
Additionally, increasing investment in research and development activities by pharmaceutical, biotechnology companies, and CROs is fostering the growth of the ambient monitoring for clinical trial kits market. As clinical trials become more complex and global in scope, the need for standardized, scalable, and interoperable monitoring solutions has become paramount. Companies are not only focusing on improving the accuracy and reliability of their monitoring devices but are also integrating advanced analytics and artificial intelligence (AI) capabilities to provide predictive insights and automate compliance reporting. This trend is expected to create lucrative opportunities for market players, particularly those offering comprehensive, end-to-end monitoring solutions tailored to the unique needs of clinical research.
From a regional perspective, North America currently dominates the ambient monitoring for clinical trial kits market, accounting for the largest share in 2024. This leadership position can be attributed to the presence of a robust pharmaceutical industry, a high volume of clinical trials, and stringent regulatory standards in the United States and Canada. Europe follows closely, driven by increasing R&D investments and harmonized regulatory frameworks across the region. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by expanding clinical research activities, rising healthcare investments, and the adoption of advanced monitoring technologies across emerging markets such as China, India, and South Korea. This regional dynamic underscores the global nature of clinical research and the critical role of ambient monitoring solutions in supporting high-quality, compliant clinical trials worldwide.
Facebook
Twitter1 The number of registries that provide data to the ICTRP has increased from nine to fifteen in between 2008/2009 and 2012. Registry acronyms stand for: ClinicalTrials.gov (CT.gov), Japan Primary Registries Network (JPRN), Iranian Registry of Clinical Trials (IRCT), Australian New Zealand Clinical Trials Registry (ANZCTR), EU Clinical Trials Register (EU-CTR), International Standard Randomized Controlled Trial Number Register (ISRCTN), Chinese Clinical Trial Register (ChiCTR), Clinical Trials Registry - India (CTRI), German Clinical Trials Register (DRKS), The Netherlands National Trial Register (NTR), Clinical Research Information Service (CRiS) Republic of Korea, Pan African Clinical Trial Registry (PACTR), Cuban Public Registry of Clinical Trials (RPCEC), Sri Lanka Clinical Trials Registry (SLCTR) and Brazilian Clinical Trials Registry (ReBec).2 Other sponsors consisted of persons that were registered as primary sponsor, non-governmental organizations, collaborative research institutions and clinical research organizations.3 Overlap was possible, total in this category was greater than 731 in 2008/2009 and greater than 386 in 2012.4 Genetic interventions consisted of gene transfer therapy and somatic cell transplants.5 The presence of study phase in records was analysed separately for trials in drugs, biologicals or vaccines. 2008/2009: Of 439 trials researching these types of interventions, study phase was reported in 370 records (84.3%). 2012: Of 221 trials researching these types of interventions, study phase was reported in 172 records (77.8%).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: Today, many initiatives and papers are devoted to clinical trial data (and to a lesser extent sample) sharing. Journal editors, pharmaceutical companies, funding agencies, governmental organizations, regulators, and clinical investigators have been debating the legal, ethical, and social implications of clinical data and sample sharing for several years. However, only little research has been conducted to unveil the patient perspective.Aim: To substantiate the current debate, we aimed to explore the attitudes of patients toward the re-use of clinical trial samples and data and to determine how they would prefer to be involved in this process.Materials and Methods: Sixteen in-depth interviews were conducted with cancer patients currently participating in a clinical trial.Results: This study indicates a general willingness of cancer patients participating in a clinical trial to allow re-use of their clinical trial data and/or samples by the original research team, and a generally open approach to share data and/or samples with other research teams, but some would like to be informed in this case. Despite divergent opinions about how patients prefer to be engaged, ranging from passive donors up to those explicitly wanting more control, participants expressed positive opinions toward technical solutions that allow indicating their preferences.Conclusion: Patients were open to sharing and re-use of data and samples to advance medical research but opinions varied on the level of patient involvement and the need for re-consent. A stratified approach for consent that allows individualization of data and sample sharing preferences may be useful, yet the implementation of such an approach warrants further research.
Facebook
TwitterThe purest type of electronic clinical data which is obtained at the point of care at a medical facility, hospital, clinic or practice. Often referred to as the electronic medical record (EMR), the EMR is generally not available to outside researchers. The data collected includes administrative and demographic information, diagnosis, treatment, prescription drugs, laboratory tests, physiologic monitoring data, hospitalization, patient insurance, etc.
Individual organizations such as hospitals or health systems may provide access to internal staff. Larger collaborations, such as the NIH Collaboratory Distributed Research Network provides mediated or collaborative access to clinical data repositories by eligible researchers. Additionally, the UW De-identified Clinical Data Repository (DCDR) and the Stanford Center for Clinical Informatics allow for initial cohort identification.
About Dataset:
333 scholarly articles cite this dataset.
Unique identifier: DOI
Dataset updated: 2023
Authors: Haoyang Mi
In this dataset, we have two dataset:
1- Clinical Data_Discovery_Cohort: Name of columns: Patient ID Specimen date Dead or Alive Date of Death Date of last Follow Sex Race Stage Event Time
2- Clinical_Data_Validation_Cohort Name of columns: Patient ID Survival time (days) Event Tumor size Grade Stage Age Sex Cigarette Pack per year Type Adjuvant Batch EGFR KRAS
Feel free to put your thought and analysis in a notebook for this datasets. And you can create some interesting and valuable ML projects for this case. Thanks for your attention.