CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
At its core, TDC collects ML tasks and associated datasets across therapeutic modalities and stages of discovery. These tasks and datasets have the following properties: Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing. Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing. Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data and models used in the following paper.
Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V., & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. In review.
The data and models are meant to be used with the ADMET-AI code, which runs the ADMET-AI web server at admet.ai.greenstonebio.com.
The data.zip file has the following structure.
data
drugbank: Contains files with drugs from the DrugBank that have received regulatory approval. drugbank_approved.csv contains the full set of approved drugs along with ADMET-AI predictions, while the other files contain subsets of these molecules used for testing the speed of ADMET prediction tools.
tdc_admet_all: Contains the data (.csv files) and RDKit features (.npz files) for all 41 single-task ADMET datasets from the Therapeutics Data Commons (TDC).
tdc_admet_multitask: Contains the data (.csv files) and RDKit features (.npz files) for the two multi-task datasets (one regression and one classification) constructed by combining the tdc_admet_all datasets.
tdc_admet_all.csv: A CSV file containing all 41 ADMET datasets from tdc_admet_all. This can be used to easily look up all ADMET properties for a given molecule in the TDC.
tdc_admet_group: Contains the data (.csv files) and RDKit features (.npz files) for the 22 TDC ADMET Benchmark Group datasets with five splits per dataset.
tdc_admet_group_raw: Contains the raw data (.csv files) used to construct the five splits per dataset in tdc_admet_group.
The models.zip file has the following structure. Note that the ADMET-AI website and Python package use the multi-task Chemprop-RDKit models below.
models
tdc_admet_all: Contains Chemprop and Chemprop-RDKit models trained on all 41 single-task TDC ADMET datasets.
tdc_admet_all_multitask: Contains Chemprop and Chemprop-RDKit models trained on the two multi-task TDC ADMET datasets (one regression and one classification).
tdc_admet_group: Contains Chemprop and Chemprop-RDKit models trained on the 22 TDC ADMET Benchmark Group datasets.
Portal to make cancer related proteomic datasets easily accessible to public. Facilitates multiomic integration in support of precision medicine through interoperability with other resources. Developed to advance our understanding of how proteins help to shape risk, diagnosis, development, progression, and treatment of cancer. One of several repositories within NCI Cancer Research Data Commons which enables researchers to link proteomic data with other data sets (e.g., genomic and imaging data) and to submit, collect, analyze, store, and share data throughout cancer data ecosystem. PDC provides access to highly curated and standardized biospecimen, clinical, and proteomic data, intuitive interface to filter, query, search, visualize and download data and metadata. Provides common data harmonization pipeline to uniformly analyze all PDC data and provides advanced visualization of quantitative information. Cloud based (Amazon Web Services) infrastructure facilitates interoperability with AWS based data analysis tools and platforms natively. Application programming interface (API) provides cloud-agnostic data access and allows third parties to extend functionality beyond PDC. Structured workspace that serves as private user data store and also data submission portal. Distributes controlled access data, such as patient-specific protein fasta sequence databases, with dbGaP authorization and eRA Commons authentication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here, the authors performed an in-silico analysis on a meta-dataset including gene-expression data from 5,342 clinically defined estrogen receptor-positive/ human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancers (BC), and DNA copy number/mutational and proteomic data, to determine whether the therapeutic response of ER+/HER2- breast cancers differs according to the molecular basal or luminal subtype.Data access: The dataset Breast_cancer_classifications.csv supporting figure 1, table 1, and supplementary tables 1-3 is publicly available in the figshare repository as part of this data record. This study used and analysed 36 publicly available datasets that are all listed in Supplementary table 8 and are cited from the data availability statement of the published article.Study aims and methodology: To evaluate the response and/or potential vulnerability to hormone treatment (HT) and other systemic therapies of BC, and to assess the degree of difference between basal and luminal breast cancer subtypes, the authors performed an in-silico analysis of a meta-dataset including gene expression data from 8,982 non-redundant BCs and DNA copy number/mutational and proteomic data from TCGA. The aim was to compare the Basal versus Luminal samples. Out of the 8,982 samples of the database, 6,563 were defined as ER+ (5,342 according to immunohistochemistry (IHC) and 1,221 according to inferred stratus).The authors analysed breast cancer gene expression data pooled from 36 public datasets (the publicly available datasets are listed in supplementary table 8), comprising 8,982 invasive primary BCs. The pre-analytic data processing was done as described previously in https://doi.org/10.1038/s41416-018-0309-1. Please refer to the published article for more details on the methodology and statistical analysis.Data supporting the figures, tables and supplementary tables in the published article: Data supporting figure 1, table 1, and supplementary tables 1-3: Dataset Breast_cancer_classifications.csv is in .csv file format. The dataset includes histo-clinical and molecular data of the tumors analysed in study, and is part of this data record.Data supporting supplementary table 4: Dataset genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.tar.gz.1 is a tar archive gz compressed of maf format files. This dataset was accessed through the Genomic Data Commons (GDC) Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/afaf2790-04d4-453a-8c1b-75cf42ffd35f.Data supporting supplementary table 5: Dataset gdc_manifest.txt consists of gz archives of txt format files. The file was accessed through the GDC Data Portal here : https://portal.gdc.cancer.gov/repository?facetTab=files&filters={"op":"and","content":[{"op":"in","content":{"field":"cases.project.project_id","value":["TCGA-BRCA"]}},{"op":"in","content":{"field":"files.access","value":["open"]}},{"op":"in","content":{"field":"files.analysis.workflow_type","value":["HTSeq - Counts"]}},{"op":"in","content":{"field":"files.experimental_strategy","value":["RNA-Seq"]}}]}&searchTableTab=filesData supporting supplementary table 6: Dataset Table S5_Revised.xlsx is in .xlsx file format and is part of the supplementary information files of the published article.Data supporting supplementary table 7: Dataset BRCA.RPPA.Level_3.tar is a tar archive of txt format files. The file was accessed through the GDC Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/85988e1b-4f7d-493e-96ae-9eee61ac2833.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Experiment 13: Uninfected Macaca mulatta exposed to pyrimethamine to produce clinical, hematological, and omics control measures. This clinical results dataset includes infection control measures. Please see the file 'E13M99MEMmXXDpWB_09212018-Readme_MULTIPL.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This list was generated 4/9/21. For an up-to-date representation of the schema query the Biomedical Data Commons graph or browser or check the github repository. (XLSX)
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global Clinical Trial Recruitment and Management Services market size is expected to grow from $3.5 billion in 2023 to an impressive $6.8 billion by 2032, reflecting a robust CAGR of 7.5%. This growth is primarily driven by the increasing complexity and number of clinical trials, the rising importance of precision medicine, and the growing demand for advanced data analytics in healthcare. As pharmaceutical and biotechnology companies continue to innovate and push the boundaries of medical research, the need for efficient and effective clinical trial recruitment and management services becomes ever more crucial.
The growth of the Clinical Trial Recruitment and Management Services market can be attributed to several key factors. One primary growth driver is the increasing prevalence of chronic diseases, which necessitates extensive research and development of new therapies. Conditions such as cancer, cardiovascular diseases, and neurological disorders are becoming more common, leading to a higher volume of clinical trials aimed at finding innovative treatments. Additionally, the aging global population further escalates the demand for new therapeutic solutions, thereby boosting the clinical trial market.
Another significant growth factor is the advancement in technology and data analytics. The integration of artificial intelligence, machine learning, and big data analytics into clinical trial processes has revolutionized patient recruitment, site identification, and data management. These technologies enable more precise identification of eligible patients, efficient trial site selection, and streamlined data management, resulting in faster and more effective clinical trials. This technological evolution not only enhances the efficiency of clinical trials but also reduces costs, making it a pivotal driver for market growth.
The regulatory environment also plays a crucial role in the expansion of the Clinical Trial Recruitment and Management Services market. Stricter regulations and guidelines imposed by regulatory authorities ensure higher standards of safety and efficacy in clinical trials. This necessitates the involvement of specialized recruitment and management services to navigate the complex regulatory landscape efficiently. Furthermore, regulatory incentives for orphan drug development and fast-track approvals for critical therapies provide additional impetus for market growth.
The regional outlook for the Clinical Trial Recruitment and Management Services market highlights significant growth in North America, driven by the presence of major pharmaceutical and biotechnology companies, advanced healthcare infrastructure, and robust regulatory frameworks. Europe follows closely, with increasing investments in healthcare research and a strong emphasis on compliance with regulatory standards. The Asia Pacific region is expected to witness the highest growth rate due to the expanding healthcare sector, growing patient pool, and increasing number of clinical trials in emerging economies like China and India.
The Clinical Trial Recruitment and Management Services market is segmented by service type into Patient Recruitment, Site Identification, Data Management, Regulatory Services, and Others. Patient Recruitment services are critical as they ensure the timely enrollment of eligible participants, a process that directly impacts the success and timeline of clinical trials. The increasing complexity of eligibility criteria and the growing emphasis on precision medicine are driving demand for specialized patient recruitment services. This segment is expected to witness significant growth, as personalized recruitment strategies become more prevalent.
Site Identification services are another crucial segment, as the selection of appropriate trial sites is fundamental to the success of clinical trials. Effective site identification can reduce trial timelines and costs, making it an essential service for sponsors. The advent of advanced data analytics and geospatial technologies has enhanced the precision and efficiency of site selection processes, contributing to the growth of this segment. Additionally, the globalization of clinical trials necessitates the identification of diverse and geographically distributed sites, further driving demand for site identification services.
Data Management services are becoming increasingly vital as the volume and complexity of clinical trial data continue to grow. Efficient data management ensures the in
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title of data: CTE & SAE Data Inventory. Description of data: List of common data elements in clinical trials with domain, availability/completeness, occurrence in trials, semantic codes and definition. (XLSX 34 kb)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Experiment 15: Aotus nancymaae infected with P. vivax Brazil VII to produce clinical and omics measures of primary infections and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream functional genomic, metabolomic, and immunological analyses. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Data were collected from animal arrival at research facility to the end of the experiment, including post-experiement curative treatments. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. Please see the file 'E15M99MCAnVpDaWB_12142018-Readme_MULTIPL_plasmodb.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The research hypothesised that miR-206 and miR-383 act as tumour suppressors in medulloblastoma (MB) and that their downregulation contributes to the aggressiveness of MB and glioblastoma (GB). By identifying and targeting the genes regulated by these microRNAs (CORO1C and SV2B), new therapeutic approaches could be developed for treating these aggressive brain tumours.
The study employed high-throughput small-RNA sequencing to analyse the expression profiles of microRNAs in MB samples. Bioinformatics tools were used to predict the target genes of the significantly downregulated miRNAs. The expression levels of the identified targets, CORO1C and SV2B, were validated through various molecular biology techniques, including Reverse Transcription-quantitative Polymerase Chain Reaction (RT-qPCR), western blotting, and immunohistochemistry. Functional assays were also performed to validate the regulatory effect of miR-206 and miR-383 on their target genes.
Both miR-206 and miR-383 were found to be significantly downregulated in MB samples, suggesting their potential role as tumour suppressors. Bioinformatics analysis identified CORO1C and SV2B as the target genes of miR-206 and miR-383, respectively. RT-qPCR, western blotting, and immunohistochemistry confirmed the overexpression of CORO1C/CORO1C and SV2B/SV2B in MB and GB cells and tissue samples. Functional assays validated that miR-206 and miR-383 directly regulate the expression of CORO1C and SV2B, respectively. The data suggested that the miR-206/CORO1C and miR-383/SV2B axes play a crucial role in the pathogenesis of MB and GB. The downregulation of these miRNAs leads to the overexpression of their target genes, contributing to the aggressiveness of these tumours. These findings indicate that restoring the levels of miR-206 and miR-383, or directly targeting CORO1C and SV2B, could be a promising therapeutic strategy for treating aggressive brain malignancies in both paediatric and adult patients.
The identification of miR-206 and miR-383 as tumour suppressors and their target genes as therapeutic targets provides a foundation for the development of novel treatments for MB and GB. Researchers and clinicians can use this data to:
o Develop miRNA mimics or gene therapy approaches to restore the levels of miR-206 and miR-383 in tumour cells.
o Design small molecule inhibitors or antibodies to specifically target CORO1C and SV2B proteins.
o Explore combination therapies that incorporate these new targets to improve treatment efficacy and reduce side effects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of collection of models that make predictions for a drugs ability to penetrate the blood brain barrier using Therapeutic Data Commons BBB-Martins dataset
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
A dataset of ~500 antibodies with binding affinity: antibody sequence, antigen sequence, Kd. Obtained from SAbDab via Therapeutic Data Commons
Python code (get_antibody_affinity_data.py) and dataset (antibody_affinity_protein_sabdab.csv)
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-LUAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.
The Cancer Imaging Program (CIP) is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the TCGA Data Portal Currently this large CT multi-sequence image collection of lung adenocarcinoma (LUAD) patients can be matched by each unique case identifier with the extensive gene and expression data of the same case from The Cancer Genome Atlas Data Portal to research the link between clinical phenome and tissue genome.
Please see the TCGA-LUAD page to learn more about the images and to obtain any supporting metadata for this collection.
A manifest file's name indicates the IDC data release in which a version of collection data was first introduced.
For example, collection_id-idc_v8-aws.s5cmd
corresponds to the contents of the
collection_id
collection introduced in IDC data
release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of
the corresponding collection was introduced.
tcga_luad-idc_v8-aws.s5cmd
: manifest of files available for download from public IDC Amazon Web Services bucketstcga_luad-idc_v8-gcs.s5cmd
: manifest of files available for download from public IDC Google Cloud Storage bucketstcga_luad-idc_v8-dcf.dcf
: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)Note that manifest files that end in -aws.s5cmd
reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd
reference
files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.
Each of the manifests include instructions in the header on how to download the included files.
To download the files using .s5cmd
manifests:
pip install --upgrade idc-index
.s5cmd
manifest file: idc download manifest.s5cmd
.To download the files using .dcf
manifest, see manifest header.
Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.
[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Experiment 23: M. mulatta infected with P. cynomolgi B strain to produce and integrate clinical, hematological, parasitological, and omics measures of acute primary infection and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream immunological, functional genomic, lipidomic, proteomic, and metabolomic measurements. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Data were collected from animal arrival at research facility to the end of the experiment, including post-experiement curative treatments. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. This is an iteration of MaHPIC Experiment 04 with the same parasite-host combination and sampling and treatment adjustments made, and this is the first in a series of experiments that includes subsequent homologous (Experiment 24, P. cynomolgi B strain) and heterologous (Experiment 25, P. cynomolgi strain ceylonensis) challenges of individuals from the Experiment 23 cohort. Please see the file 'E23M99MEMmCyDaWB_README-PlasmoDB-09242018-V1_MULTIPL.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Random Forest predictions for a drugs ability to penetrate the blood brain barrier using Therapeutic Data Commons BBB-Martins dataset.
Includes .tsv files with the actual and predicted classifications of BBB permeability for each drug in the TDC test set using a Random Forest model. Note that a vlassification label of 0 means not-permeable and a label of 1 means the drug is BBB permeable.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Experiment 02: Uninfected Macaca mulatta that serve as a control for in vivo biotinylation studies. The macaques received an intravenous infusion of a water-soluble biotin derivative to determine the erythrocyte lifespan via daily quantification of the biotinylated cells using flow cytometry. Clinical, hematological, and metabolomics measures were collected in the course of the follow-up. Please see the file 'E02TXXMEMmBiXXWB_02122018-Readme_MULTIPL_plasmodb.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Experiment 24: Macaca mulatta infected with Plasmodium cynomolgi B strain, in a homologous challenge, to produce and integrate clinical, hematological, parasitological, and omics measures of acute primary infection and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream immunological, functional genomic, lipidomic, and metabolomic measurements. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. This is the second in a series of experiments that includes infection of malaria-naïve subjects (Experiment 23, P. cynomolgi B strain) and heterologous challenge (Experiment 25, P. cynomolgi strain ceylonensis) for the individuals from the same cohort. Subjects were cleared of previous infection with P. cynomolgi B strain via treatment with the anti-malarial drugs artemether, chloroquine, and primaquine. Please see the file 'E24M99MEMmCyDaWB_README-PlasmoDB-09242018-V1_MULTIPL.txt for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
Data repository of de-identified patient data, aggregated in a standardized manner, to enable analyses across many rare diseases and to facilitate various research projects, clinical studies, and clinical trials. The aim is to facilitate drug and therapeutics development, and to improve the quality of life for the many millions of people who are suffering from rare diseases. The goal of GRDR is to enable analyses of data across many rare diseases and to facilitate clinical trials and other studies. During the two-year pilot program, a web-based template will be developed to allow any patient organization to establish a rare disease patient registry. At the conclusion of the program, guidance will be available to patient groups to establish a registry and to contribute de-identified patient data to the GRDR repository. A Request for Information (RFI) was released on February 10, 2012 requesting information from patient groups about their interest in participating in a GRDR pilot project. ORDR selected 30 patient organizations to participate in this pilot program to test the different functionalities of the GRDR. Fifteen (15) organizations with established registries and 15 organizations that do not have patient registry. The 15 patient groups, each without a registry, were selected to assist in testing the implementation of the ORDR Common Data Elements (CDEs) in the newly developed registry infrastructure. These organizations will participate in the development and promotion of a new patient registry for their rare disease. The GRDR program will fund the development and hosting of the registry during the pilot program. Thereafter, the patient registry is expected to be self-sustaining.The 15 established patient registries were selected to integrate their de-identified data into the GRDR to evaluate the data mapping and data import/export processes. The GRDR team will assist these organizations in mapping their existing registry data to the CDEs. Participating registries must have a means to export their de-identified registry data into a specified data format that will facilitate loading the data into the GRDR repository on a regular basis. The GRDR will also develop the capability to link patients'''' data and medical information to donated biospecimens by using a Voluntary Global Unique Patient Identifier (GUID). The identifier will enable the creation of an interface between the patient registries that are linked to biorepositories and the Rare Disease Human Biospecimens/Biorepositories (RD-HUB) http://biospecimens.ordr.info.nih.gov/.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.