63 datasets found

H
Therapeutics Data Commons (https://tdcommons.ai)
dataverse.harvard.edu
search.dataone.org
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik (2025). Therapeutics Data Commons (https://tdcommons.ai) [Dataset]. http://doi.org/10.7910/DVN/21LKWG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/21LKWG
Dataset updated
May 7, 2025
Dataset provided by
Harvard Dataverse
Authors
Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.
O
tdcommons (Therapeutics Data Commons)
opendatalab.com
zip
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Institute of Technology (2023). tdcommons (Therapeutics Data Commons) [Dataset]. https://opendatalab.com/OpenDataLab/tdcommons
Explore at:
zip(9558313 bytes)Available download formats
Dataset updated
May 1, 2023
Dataset provided by
IQVIA
Carnegie Mellon University
Harvard University
University of Illinois Urbana-Champaign
Massachusetts Institute of Technology
Georgia Institute of Technology
Stanford University
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
At its core, TDC collects ML tasks and associated datasets across therapeutic modalities and stages of discovery. These tasks and datasets have the following properties: Instrumenting disease treatment from bench to bedside with AI/ML: TDC covers a variety of learning tasks going from wet-lab target identification to biomedical product manufacturing. Building off the latest biotechnological platforms: TDC is regularly updated with novel datasets and tasks, such as antibody therapeutics and gene editing. Providing AI/ML-ready datasets: TDC datasets provide rich information on biomedical entities. This information is carefully curated, processed, and readily available in TDC.
ADMET-AI: A machine learning ADMET platform for evaluation of large-scale...
zenodo.org
explore.openaire.eu
zip
Updated Jul 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Swanson; Kyle Swanson; Parker Walther; Jeremy Leitz; Souhrid Mukherjee; Joseph C. Wu; Rabindra V. Shivnaraine; James Zou; Parker Walther; Jeremy Leitz; Souhrid Mukherjee; Joseph C. Wu; Rabindra V. Shivnaraine; James Zou (2024). ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries – Data and Models [Dataset]. http://doi.org/10.5281/zenodo.10372419
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10372419
Dataset updated
Jul 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kyle Swanson; Kyle Swanson; Parker Walther; Jeremy Leitz; Souhrid Mukherjee; Joseph C. Wu; Rabindra V. Shivnaraine; James Zou; Parker Walther; Jeremy Leitz; Souhrid Mukherjee; Joseph C. Wu; Rabindra V. Shivnaraine; James Zou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data and models used in the following paper.

Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V., & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. In review.

The data and models are meant to be used with the ADMET-AI code, which runs the ADMET-AI web server at admet.ai.greenstonebio.com.

The data.zip file has the following structure.

data

drugbank: Contains files with drugs from the DrugBank that have received regulatory approval. drugbank_approved.csv contains the full set of approved drugs along with ADMET-AI predictions, while the other files contain subsets of these molecules used for testing the speed of ADMET prediction tools.

tdc_admet_all: Contains the data (.csv files) and RDKit features (.npz files) for all 41 single-task ADMET datasets from the Therapeutics Data Commons (TDC).

tdc_admet_multitask: Contains the data (.csv files) and RDKit features (.npz files) for the two multi-task datasets (one regression and one classification) constructed by combining the tdc_admet_all datasets.

tdc_admet_all.csv: A CSV file containing all 41 ADMET datasets from tdc_admet_all. This can be used to easily look up all ADMET properties for a given molecule in the TDC.

tdc_admet_group: Contains the data (.csv files) and RDKit features (.npz files) for the 22 TDC ADMET Benchmark Group datasets with five splits per dataset.

tdc_admet_group_raw: Contains the raw data (.csv files) used to construct the five splits per dataset in tdc_admet_group.

The models.zip file has the following structure. Note that the ADMET-AI website and Python package use the multi-task Chemprop-RDKit models below.

models

tdc_admet_all: Contains Chemprop and Chemprop-RDKit models trained on all 41 single-task TDC ADMET datasets.

tdc_admet_all_multitask: Contains Chemprop and Chemprop-RDKit models trained on the two multi-task TDC ADMET datasets (one regression and one classification).

tdc_admet_group: Contains Chemprop and Chemprop-RDKit models trained on the 22 TDC ADMET Benchmark Group datasets.
d
Proteomic Data Commons
dknet.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Proteomic Data Commons [Dataset]. http://identifiers.org/RRID:SCR_018273
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_018273 https://identifiers.org/RRID:SCR_018273/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
Portal to make cancer related proteomic datasets easily accessible to public. Facilitates multiomic integration in support of precision medicine through interoperability with other resources. Developed to advance our understanding of how proteins help to shape risk, diagnosis, development, progression, and treatment of cancer. One of several repositories within NCI Cancer Research Data Commons which enables researchers to link proteomic data with other data sets (e.g., genomic and imaging data) and to submit, collect, analyze, store, and share data throughout cancer data ecosystem. PDC provides access to highly curated and standardized biospecimen, clinical, and proteomic data, intuitive interface to filter, query, search, visualize and download data and metadata. Provides common data harmonization pipeline to uniformly analyze all PDC data and provides advanced visualization of quantitative information. Cloud based (Amazon Web Services) infrastructure facilitates interoperability with AWS based data analysis tools and platforms natively. Application programming interface (API) provides cloud-agnostic data access and allows third parties to extend functionality beyond PDC. Structured workspace that serves as private user data store and also data submission portal. Distributes controlled access data, such as patient-specific protein fasta sequence databases, with dbGaP authorization and eRA Commons authentication.
Metadata and data files supporting the published article: The therapeutic...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum (2023). Metadata and data files supporting the published article: The therapeutic response of ER+/HER2- breast cancers differs according to the molecular Basal or Luminal subtype [Dataset]. http://doi.org/10.6084/m9.figshare.11558676.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11558676.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
François BERTUCCI; Pascal Finetti; Anthony Goncalves; Daniel Birnbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here, the authors performed an in-silico analysis on a meta-dataset including gene-expression data from 5,342 clinically defined estrogen receptor-positive/ human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancers (BC), and DNA copy number/mutational and proteomic data, to determine whether the therapeutic response of ER+/HER2- breast cancers differs according to the molecular basal or luminal subtype.Data access: The dataset Breast_cancer_classifications.csv supporting figure 1, table 1, and supplementary tables 1-3 is publicly available in the figshare repository as part of this data record. This study used and analysed 36 publicly available datasets that are all listed in Supplementary table 8 and are cited from the data availability statement of the published article.Study aims and methodology: To evaluate the response and/or potential vulnerability to hormone treatment (HT) and other systemic therapies of BC, and to assess the degree of difference between basal and luminal breast cancer subtypes, the authors performed an in-silico analysis of a meta-dataset including gene expression data from 8,982 non-redundant BCs and DNA copy number/mutational and proteomic data from TCGA. The aim was to compare the Basal versus Luminal samples. Out of the 8,982 samples of the database, 6,563 were defined as ER+ (5,342 according to immunohistochemistry (IHC) and 1,221 according to inferred stratus).The authors analysed breast cancer gene expression data pooled from 36 public datasets (the publicly available datasets are listed in supplementary table 8), comprising 8,982 invasive primary BCs. The pre-analytic data processing was done as described previously in https://doi.org/10.1038/s41416-018-0309-1. Please refer to the published article for more details on the methodology and statistical analysis.Data supporting the figures, tables and supplementary tables in the published article: Data supporting figure 1, table 1, and supplementary tables 1-3: Dataset Breast_cancer_classifications.csv is in .csv file format. The dataset includes histo-clinical and molecular data of the tumors analysed in study, and is part of this data record.Data supporting supplementary table 4: Dataset genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.tar.gz.1 is a tar archive gz compressed of maf format files. This dataset was accessed through the Genomic Data Commons (GDC) Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/afaf2790-04d4-453a-8c1b-75cf42ffd35f.Data supporting supplementary table 5: Dataset gdc_manifest.txt consists of gz archives of txt format files. The file was accessed through the GDC Data Portal here : https://portal.gdc.cancer.gov/repository?facetTab=files&filters={"op":"and","content":[{"op":"in","content":{"field":"cases.project.project_id","value":["TCGA-BRCA"]}},{"op":"in","content":{"field":"files.access","value":["open"]}},{"op":"in","content":{"field":"files.analysis.workflow_type","value":["HTSeq - Counts"]}},{"op":"in","content":{"field":"files.experimental_strategy","value":["RNA-Seq"]}}]}&searchTableTab=filesData supporting supplementary table 6: Dataset Table S5_Revised.xlsx is in .xlsx file format and is part of the supplementary information files of the published article.Data supporting supplementary table 7: Dataset BRCA.RPPA.Level_3.tar is a tar archive of txt format files. The file was accessed through the GDC Data Portal and can be downloaded directly here: https://api.gdc.cancer.gov/data/85988e1b-4f7d-493e-96ae-9eee61ac2833.
Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 13 (E13)...
ckan.cyverse.rocks
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.cyverse.rocks (2024). Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 13 (E13) Clinical Results - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/malaria-host-pathogen-interaction-center-mahpic-experiment-13-e13-clinical-results
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
CKANhttps://ckan.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Experiment 13: Uninfected Macaca mulatta exposed to pyrimethamine to produce clinical, hematological, and omics control measures. This clinical results dataset includes infection control measures. Please see the file 'E13M99MEMmXXDpWB_09212018-Readme_MULTIPL.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
The properties along with their domain, range, and description that are...
plos.figshare.com
xlsx
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samantha N. Piekos; Sadhana Gaddam; Pranav Bhardwaj; Prashanth Radhakrishnan; Ramanathan V. Guha; Anthony E. Oro (2023). The properties along with their domain, range, and description that are uniquely defined by Biomedical Data Commons schema to represent the data in the knowledge graph. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009382.s016
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1009382.s016
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Samantha N. Piekos; Sadhana Gaddam; Pranav Bhardwaj; Prashanth Radhakrishnan; Ramanathan V. Guha; Anthony E. Oro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This list was generated 4/9/21. For an up-to-date representation of the schema query the Biomedical Data Commons graph or browser or check the github repository. (XLSX)
Clinical Trial Recruitment and Management Services Market Report | Global...
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Clinical Trial Recruitment and Management Services Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-clinical-trial-recruitment-and-management-services-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Clinical Trial Recruitment and Management Services Market Outlook

The global Clinical Trial Recruitment and Management Services market size is expected to grow from $3.5 billion in 2023 to an impressive $6.8 billion by 2032, reflecting a robust CAGR of 7.5%. This growth is primarily driven by the increasing complexity and number of clinical trials, the rising importance of precision medicine, and the growing demand for advanced data analytics in healthcare. As pharmaceutical and biotechnology companies continue to innovate and push the boundaries of medical research, the need for efficient and effective clinical trial recruitment and management services becomes ever more crucial.

The growth of the Clinical Trial Recruitment and Management Services market can be attributed to several key factors. One primary growth driver is the increasing prevalence of chronic diseases, which necessitates extensive research and development of new therapies. Conditions such as cancer, cardiovascular diseases, and neurological disorders are becoming more common, leading to a higher volume of clinical trials aimed at finding innovative treatments. Additionally, the aging global population further escalates the demand for new therapeutic solutions, thereby boosting the clinical trial market.

Another significant growth factor is the advancement in technology and data analytics. The integration of artificial intelligence, machine learning, and big data analytics into clinical trial processes has revolutionized patient recruitment, site identification, and data management. These technologies enable more precise identification of eligible patients, efficient trial site selection, and streamlined data management, resulting in faster and more effective clinical trials. This technological evolution not only enhances the efficiency of clinical trials but also reduces costs, making it a pivotal driver for market growth.

The regulatory environment also plays a crucial role in the expansion of the Clinical Trial Recruitment and Management Services market. Stricter regulations and guidelines imposed by regulatory authorities ensure higher standards of safety and efficacy in clinical trials. This necessitates the involvement of specialized recruitment and management services to navigate the complex regulatory landscape efficiently. Furthermore, regulatory incentives for orphan drug development and fast-track approvals for critical therapies provide additional impetus for market growth.

The regional outlook for the Clinical Trial Recruitment and Management Services market highlights significant growth in North America, driven by the presence of major pharmaceutical and biotechnology companies, advanced healthcare infrastructure, and robust regulatory frameworks. Europe follows closely, with increasing investments in healthcare research and a strong emphasis on compliance with regulatory standards. The Asia Pacific region is expected to witness the highest growth rate due to the expanding healthcare sector, growing patient pool, and increasing number of clinical trials in emerging economies like China and India.

Service Type Analysis

The Clinical Trial Recruitment and Management Services market is segmented by service type into Patient Recruitment, Site Identification, Data Management, Regulatory Services, and Others. Patient Recruitment services are critical as they ensure the timely enrollment of eligible participants, a process that directly impacts the success and timeline of clinical trials. The increasing complexity of eligibility criteria and the growing emphasis on precision medicine are driving demand for specialized patient recruitment services. This segment is expected to witness significant growth, as personalized recruitment strategies become more prevalent.

Site Identification services are another crucial segment, as the selection of appropriate trial sites is fundamental to the success of clinical trials. Effective site identification can reduce trial timelines and costs, making it an essential service for sponsors. The advent of advanced data analytics and geospatial technologies has enhanced the precision and efficiency of site selection processes, contributing to the growth of this segment. Additionally, the globalization of clinical trials necessitates the identification of diverse and geographically distributed sites, further driving demand for site identification services.

Data Management services are becoming increasingly vital as the volume and complexity of clinical trial data continue to grow. Efficient data management ensures the in
Additional file 1: of Common data elements for secondary use of electronic...
springernature.figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Bruland; Mark McGilchrist; Eric Zapletal; Dionisio Acosta; Johann Proeve; Scott Askin; Thomas Ganslandt; Justin Doods; Martin Dugas (2023). Additional file 1: of Common data elements for secondary use of electronic health record data for clinical trial execution and serious adverse event reporting [Dataset]. http://doi.org/10.6084/m9.figshare.c.3613013_D1.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3613013_D1.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Philipp Bruland; Mark McGilchrist; Eric Zapletal; Dionisio Acosta; Johann Proeve; Scott Askin; Thomas Ganslandt; Justin Doods; Martin Dugas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title of data: CTE & SAE Data Inventory. Description of data: List of common data elements in clinical trials with domain, availability/completeness, occurrence in trials, semantic codes and definition. (XLSX 34 kb)
Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 15 (E15)...
ckan.cyverse.rocks
Updated Jun 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.cyverse.rocks (2024). Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 15 (E15) Clinical Results - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/malaria-host-pathogen-interaction-center-mahpic-experiment-15-e15-clinical-results
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
CKANhttps://ckan.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Experiment 15: Aotus nancymaae infected with P. vivax Brazil VII to produce clinical and omics measures of primary infections and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream functional genomic, metabolomic, and immunological analyses. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Data were collected from animal arrival at research facility to the end of the experiment, including post-experiement curative treatments. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. Please see the file 'E15M99MCAnVpDaWB_12142018-Readme_MULTIPL_plasmodb.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
m
Data from: New microRNA-based therapies reveal common targets in paediatric...
data.mendeley.com
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denis Mustafov (2024). New microRNA-based therapies reveal common targets in paediatric medulloblastoma and adult glioblastoma [Dataset]. http://doi.org/10.17632/yrryf4btst.1
Explore at:
Unique identifier
https://doi.org/10.17632/yrryf4btst.1
Dataset updated
Jul 19, 2024
Authors
Denis Mustafov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The research hypothesised that miR-206 and miR-383 act as tumour suppressors in medulloblastoma (MB) and that their downregulation contributes to the aggressiveness of MB and glioblastoma (GB). By identifying and targeting the genes regulated by these microRNAs (CORO1C and SV2B), new therapeutic approaches could be developed for treating these aggressive brain tumours.

The study employed high-throughput small-RNA sequencing to analyse the expression profiles of microRNAs in MB samples. Bioinformatics tools were used to predict the target genes of the significantly downregulated miRNAs. The expression levels of the identified targets, CORO1C and SV2B, were validated through various molecular biology techniques, including Reverse Transcription-quantitative Polymerase Chain Reaction (RT-qPCR), western blotting, and immunohistochemistry. Functional assays were also performed to validate the regulatory effect of miR-206 and miR-383 on their target genes.

Both miR-206 and miR-383 were found to be significantly downregulated in MB samples, suggesting their potential role as tumour suppressors. Bioinformatics analysis identified CORO1C and SV2B as the target genes of miR-206 and miR-383, respectively. RT-qPCR, western blotting, and immunohistochemistry confirmed the overexpression of CORO1C/CORO1C and SV2B/SV2B in MB and GB cells and tissue samples. Functional assays validated that miR-206 and miR-383 directly regulate the expression of CORO1C and SV2B, respectively. The data suggested that the miR-206/CORO1C and miR-383/SV2B axes play a crucial role in the pathogenesis of MB and GB. The downregulation of these miRNAs leads to the overexpression of their target genes, contributing to the aggressiveness of these tumours. These findings indicate that restoring the levels of miR-206 and miR-383, or directly targeting CORO1C and SV2B, could be a promising therapeutic strategy for treating aggressive brain malignancies in both paediatric and adult patients.

The identification of miR-206 and miR-383 as tumour suppressors and their target genes as therapeutic targets provides a foundation for the development of novel treatments for MB and GB. Researchers and clinicians can use this data to:

o Develop miRNA mimics or gene therapy approaches to restore the levels of miR-206 and miR-383 in tumour cells.

o Design small molecule inhibitors or antibodies to specifically target CORO1C and SV2B proteins.

o Explore combination therapies that incorporate these new targets to improve treatment efficacy and reduce side effects.
f
TDC BBB-Martins Model Results Summary
figshare.com
txt
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rick Fontenot (2023). TDC BBB-Martins Model Results Summary [Dataset]. http://doi.org/10.6084/m9.figshare.22354567.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22354567.v2
Dataset updated
Aug 29, 2023
Dataset provided by
figshare
Authors
Rick Fontenot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of collection of models that make predictions for a drugs ability to penetrate the blood brain barrier using Therapeutic Data Commons BBB-Martins dataset
Antibody dataset Kd
zenodo.org
csv, text/x-python
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akbar Rahmad; Akbar Rahmad (2024). Antibody dataset Kd [Dataset]. http://doi.org/10.5281/zenodo.13120765
Explore at:
csv, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13120765
Dataset updated
Aug 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Akbar Rahmad; Akbar Rahmad
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
A dataset of ~500 antibodies with binding affinity: antibody sequence, antigen sequence, Kd. Obtained from SAbDab via Therapeutic Data Commons

Python code (get_antibody_affinity_data.py) and dataset (antibody_affinity_protein_sabdab.csv)
DICOM converted Slide Microscopy images for the TCGA-LUAD collection
zenodo.org
bin
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim (2024). DICOM converted Slide Microscopy images for the TCGA-LUAD collection [Dataset]. http://doi.org/10.5281/zenodo.12689916
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12689916
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Clunie; David Clunie; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim; Andrey Fedorov; Andrey Fedorov; William Clifford; David Pot; Ulrike Wagner; Keyvan Farahani; Erika Kim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset corresponds to a collection of images and/or image-derived data available from National Cancer Institute Imaging Data Commons (IDC) [1]. This dataset was converted into DICOM representation and ingested by the IDC team. You can explore and visualize the corresponding images using IDC Portal here: TCGA-LUAD. You can use the manifests included in this Zenodo record to download the content of the collection following the Download instructions below.

Collection description

The Cancer Imaging Program (CIP) is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the TCGA Data Portal Currently this large CT multi-sequence image collection of lung adenocarcinoma (LUAD) patients can be matched by each unique case identifier with the extensive gene and expression data of the same case from The Cancer Genome Atlas Data Portal to research the link between clinical phenome and tissue genome.

Please see the TCGA-LUAD page to learn more about the images and to obtain any supporting metadata for this collection.

Files included

A manifest file's name indicates the IDC data release in which a version of collection data was first introduced. For example, collection_id-idc_v8-aws.s5cmd corresponds to the contents of the collection_id collection introduced in IDC data release v8. If there is a subsequent version of this Zenodo page, it will indicate when a subsequent version of the corresponding collection was introduced.

tcga_luad-idc_v8-aws.s5cmd: manifest of files available for download from public IDC Amazon Web Services buckets

tcga_luad-idc_v8-gcs.s5cmd: manifest of files available for download from public IDC Google Cloud Storage buckets

tcga_luad-idc_v8-dcf.dcf: Gen3 manifest (for details see https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)

Note that manifest files that end in -aws.s5cmd reference files stored in Amazon Web Services (AWS) buckets, while -gcs.s5cmd reference files in Google Cloud Storage. The actual files are identical and are mirrored between AWS and GCP.

Download instructions

Each of the manifests include instructions in the header on how to download the included files.

To download the files using .s5cmd manifests:

install idc-index package: pip install --upgrade idc-index

download the files referenced by manifests included in this dataset by passing the .s5cmd manifest file: idc download manifest.s5cmd.

To download the files using .dcf manifest, see manifest header.

Acknowledgments

Imaging Data Commons team has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN261201500003l.

References

[1] Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 23 (E23)...
ckan.cyverse.rocks
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.cyverse.rocks (2024). Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 23 (E23) Clinical Results - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/malaria-host-pathogen-interaction-center-mahpic-experiment-23-e23-clinical-results
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
CKANhttps://ckan.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Experiment 23: M. mulatta infected with P. cynomolgi B strain to produce and integrate clinical, hematological, parasitological, and omics measures of acute primary infection and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream immunological, functional genomic, lipidomic, proteomic, and metabolomic measurements. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Data were collected from animal arrival at research facility to the end of the experiment, including post-experiement curative treatments. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. This is an iteration of MaHPIC Experiment 04 with the same parasite-host combination and sampling and treatment adjustments made, and this is the first in a series of experiments that includes subsequent homologous (Experiment 24, P. cynomolgi B strain) and heterologous (Experiment 25, P. cynomolgi strain ceylonensis) challenges of individuals from the Experiment 23 cohort. Please see the file 'E23M99MEMmCyDaWB_README-PlasmoDB-09242018-V1_MULTIPL.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
f
Random Forest TDC-BBB predictions
figshare.com
txt
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rick Fontenot (2023). Random Forest TDC-BBB predictions [Dataset]. http://doi.org/10.6084/m9.figshare.22354561.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22354561.v2
Dataset updated
Aug 29, 2023
Dataset provided by
figshare
Authors
Rick Fontenot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Random Forest predictions for a drugs ability to penetrate the blood brain barrier using Therapeutic Data Commons BBB-Martins dataset.

Includes .tsv files with the actual and predicted classifications of BBB permeability for each drug in the TDC test set using a Random Forest model. Note that a vlassification label of 0 means not-permeable and a label of 1 means the drug is BBB permeable.
Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 02 (E02)...
ckan.cyverse.rocks
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.cyverse.rocks (2024). Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 02 (E02) Clinical Results - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/malaria-host-pathogen-interaction-center-mahpic-experiment-02-e02-clinical-results
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
CKANhttps://ckan.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Experiment 02: Uninfected Macaca mulatta that serve as a control for in vivo biotinylation studies. The macaques received an intravenous infusion of a water-soluble biotin derivative to determine the erythrocyte lifespan via daily quantification of the biotinylated cells using flow cytometry. Clinical, hematological, and metabolomics measures were collected in the course of the follow-up. Please see the file 'E02TXXMEMmBiXXWB_02122018-Readme_MULTIPL_plasmodb.txt' for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
c
The Cancer Genome Atlas Colon Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 24 (E24)...
ckan.cyverse.rocks
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.cyverse.rocks (2024). Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 24 (E24) Clinical Results - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/malaria-host-pathogen-interaction-center-mahpic-experiment-24-e24-clinical-results
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
CKANhttps://ckan.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Experiment 24: Macaca mulatta infected with Plasmodium cynomolgi B strain, in a homologous challenge, to produce and integrate clinical, hematological, parasitological, and omics measures of acute primary infection and relapses. This clinical results dataset contains measures of disease infection including counts and calucations for: CBCs, reticulocytes, parasitemias, and downstream immunological, functional genomic, lipidomic, and metabolomic measurements. Results also include data and metadata from veterinarians on all facets of animal access, including, but not limited to: treatments, hematology, biochemical analyses, parasitology, bacteriology, and surgery statistics, etc. Results also include documentation of data collection and analysis methods and supporting clinical matarials that differentiate between 'Idealized' and 'Actual' disease progression. This is the second in a series of experiments that includes infection of malaria-naïve subjects (Experiment 23, P. cynomolgi B strain) and heterologous challenge (Experiment 25, P. cynomolgi strain ceylonensis) for the individuals from the same cohort. Subjects were cleared of previous infection with P. cynomolgi B strain via treatment with the anti-malarial drugs artemether, chloroquine, and primaquine. Please see the file 'E24M99MEMmCyDaWB_README-PlasmoDB-09242018-V1_MULTIPL.txt for a full description of this dataset and for instructions on how to access other datasets from this experiment. Please see https://plasmodb.org/plasmo/mahpic.jsp for descriptions and locations of all public MaHPIC datasets.
n
GRDR
neuinfo.org
dknet.org
+1more
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). GRDR [Dataset]. http://identifiers.org/RRID:SCR_008978
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008978
Dataset updated
Nov 12, 2024
Description
Data repository of de-identified patient data, aggregated in a standardized manner, to enable analyses across many rare diseases and to facilitate various research projects, clinical studies, and clinical trials. The aim is to facilitate drug and therapeutics development, and to improve the quality of life for the many millions of people who are suffering from rare diseases. The goal of GRDR is to enable analyses of data across many rare diseases and to facilitate clinical trials and other studies. During the two-year pilot program, a web-based template will be developed to allow any patient organization to establish a rare disease patient registry. At the conclusion of the program, guidance will be available to patient groups to establish a registry and to contribute de-identified patient data to the GRDR repository. A Request for Information (RFI) was released on February 10, 2012 requesting information from patient groups about their interest in participating in a GRDR pilot project. ORDR selected 30 patient organizations to participate in this pilot program to test the different functionalities of the GRDR. Fifteen (15) organizations with established registries and 15 organizations that do not have patient registry. The 15 patient groups, each without a registry, were selected to assist in testing the implementation of the ORDR Common Data Elements (CDEs) in the newly developed registry infrastructure. These organizations will participate in the development and promotion of a new patient registry for their rare disease. The GRDR program will fund the development and hosting of the registry during the pilot program. Thereafter, the patient registry is expected to be self-sustaining.The 15 established patient registries were selected to integrate their de-identified data into the GRDR to evaluate the data mapping and data import/export processes. The GRDR team will assist these organizations in mapping their existing registry data to the CDEs. Participating registries must have a means to export their de-identified registry data into a specified data format that will facilitate loading the data into the GRDR repository on a regular basis. The GRDR will also develop the capability to link patients'''' data and medical information to donated biospecimens by using a Voluntary Global Unique Patient Identifier (GUID). The identifier will enable the creation of an interface between the patient registries that are linked to biorepositories and the Rare Disease Human Biospecimens/Biorepositories (RD-HUB) http://biospecimens.ordr.info.nih.gov/.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik (2025). Therapeutics Data Commons (https://tdcommons.ai) [Dataset]. http://doi.org/10.7910/DVN/21LKWG

Therapeutics Data Commons (https://tdcommons.ai)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/21LKWG

Dataset updated

May 7, 2025

Dataset provided by

Harvard Dataverse

Authors

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Therapeutics Data Commons (TDC) is an open-science initiative started at Harvard with AI/ML-ready datasets and ML tasks for therapeutics. It provides an ecosystem of tools, leaderboards, and community resources, including data functions, model benchmarking and comparison strategies, meaningful data splits, data processors, public leaderboards, and molecule generation oracles. All resources are integrated and accessible via an open Python library. TDC is available at https://tdcommons.ai.

Clear search

Close search

Google apps

Main menu

Therapeutics Data Commons (https://tdcommons.ai)

tdcommons (Therapeutics Data Commons)

ADMET-AI: A machine learning ADMET platform for evaluation of large-scale...

Proteomic Data Commons

Metadata and data files supporting the published article: The therapeutic...

Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 13 (E13)...

The properties along with their domain, range, and description that are...

Clinical Trial Recruitment and Management Services Market Report | Global...

Clinical Trial Recruitment and Management Services Market Outlook

Service Type Analysis

Additional file 1: of Common data elements for secondary use of electronic...

Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 15 (E15)...

Data from: New microRNA-based therapies reveal common targets in paediatric...

TDC BBB-Martins Model Results Summary

Antibody dataset Kd

DICOM converted Slide Microscopy images for the TCGA-LUAD collection

Collection description

Files included

Download instructions

Acknowledgments

References

Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 23 (E23)...

Random Forest TDC-BBB predictions

Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 02 (E02)...

The Cancer Genome Atlas Colon Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Malaria Host Pathogen Interaction Center (MaHPIC) Experiment 24 (E24)...

GRDR

Therapeutics Data Commons (https://tdcommons.ai)