7 datasets found
  1. GDSC1

    • zenodo.org
    bin
    Updated Jun 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Haibe-Kains; Benjamin Haibe-Kains (2021). GDSC1 [Dataset]. http://doi.org/10.5281/zenodo.4730670
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Haibe-Kains; Benjamin Haibe-Kains
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.4730670

    Disclaimer

    The GDSC1 data have been generated and shared by the Wellcome Trust Sanger Institute as part of the Genomics of Drug Sensitivity in Cancer (GDSC) Programme. The Haibe-Kains Lab has reprocessed and re-annotated the data to maximize overlap with other pharmacogenomic datasets.

    Data Usage Policy

    Users have a non-exclusive, non-transferable right to use data files for internal proprietary research and educational purposes, including target, biomarker and drug discovery. Excluded from this licence are use of the data (in whole or any significant part) for resale either alone or in combination with additional data/product offerings, or for provision of commercial services.

    Please note: The data files are experimental and academic in nature and are not licensed or certified by any regulatory body. Genome Research Limited provides access to data files on an “as is” basis and excludes all warranties of any kind (express or implied). If you are interested in incorporating results or software into a product, or have questions, please contact depmap@sanger.ac.uk.

    Source: https://depmap.sanger.ac.uk/documentation/data-usage-policy/

    Sanger's terms and conditions: http://www.cancerrxgene.org/legal

    Please cite the following when using these data

    Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U, Garnett MJ. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2016 Jul 28;166(3):740-754. doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7. PMID: 27397505; PMCID: PMC4967469.

    Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61. doi: 10.1093/nar/gks1111. Epub 2012 Nov 23. PMID: 23180760; PMCID: PMC3531057.

    Picco G, Chen ED, Alonso LG, Behan FM, Gonçalves E, Bignell G, Matchan A, Fu B, Banerjee R, Anderson E, Butler A, Benes CH, McDermott U, Dow D, Iorio F, Stronach E, Yang F, Yusa K, Saez-Rodriguez J, Garnett MJ. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nat Commun. 2019 May 16;10(1):2198. doi: 10.1038/s41467-019-09940-1. PMID: 31097696; PMCID: PMC6522557.

  2. Z

    GDSC1 PSet

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haibe-Kains, Benjamin (2020). GDSC1 PSet [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3601388
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Haibe-Kains, Benjamin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PharmacoSet (PSet) for GDSC1 dataset

    Metadata for this PSet can be found at: http://orcestra.ca/10.5281/zenodo.3601389

  3. GDSC1

    • zenodo.org
    bin
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Haibe-Kains; Benjamin Haibe-Kains (2020). GDSC1 [Dataset]. http://doi.org/10.5281/zenodo.3880056
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Haibe-Kains; Benjamin Haibe-Kains
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.3880056

  4. cross-dataset-drp-paper

    • zenodo.org
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Partin; A. Partin (2025). cross-dataset-drp-paper [Dataset]. http://doi.org/10.5281/zenodo.15258451
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    A. Partin; A. Partin
    Description

    This benchmark data was train and evaluate the models presented in the paper: A. Partin and P. Vasanthakumari et al. "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis"

    The benchmark data for Cross-Study Analysis (CSA) include four kinds of data, which are cell line response data, cell line multi-omics data, drug feature data, and data partitions. The figure below illustrates the curation, processing, and assembly of benchmark data, and a unified schema for data curation. Cell line response data were extracted from five sources, including the Cancer Cell Line Encyclopedia (CCLE), the Cancer Therapeutics Response Portal version 2 (CTRPv2), the Genomics of Drug Sensitivity in Cancer version 1 (GDSC1), the Genomics of Drug Sensitivity in Cancer version 2 (GDSC2), and the Genentech Cell Line Screening Initiative (GCSI). These are five large-scale cell line drug screening studies. We extracted their multi-dose viability data and used a unified dose response fitting pipeline to calculate multiple dose-independent response metrics as shown in the figure below, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50). The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as descritizing gene copy numbers and mapping between different gene identifier systems. Drug information was retrived from PubChem. Based on the drug SMILES (Simplified Molecular Input Line Entry Specification) strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages. Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models. The Table below shows the numbers of cell lines, drugs, and experiments in each dataset. Across the five datasets, there are 785 unique cell lines and 749 unique drugs. All cell lines have gene expression, mutation, DNA methylation, and copy number data available. 760 of the cell lines have RPPA protein expressions, and 781 of them have miRNA expressions.

    Further description is provided here: https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html

  5. f

    Table_1_In-depth single-cell and bulk-RNA sequencing developed a...

    • frontiersin.figshare.com
    xlsx
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liangyu Zhang; Xun Zhang; Maohao Guan; Fengqiang Yu; Fancai Lai (2023). Table_1_In-depth single-cell and bulk-RNA sequencing developed a NETosis-related gene signature affects non-small-cell lung cancer prognosis and tumor microenvironment: results from over 3,000 patients.xlsx [Dataset]. http://doi.org/10.3389/fonc.2023.1282335.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 19, 2023
    Dataset provided by
    Frontiers
    Authors
    Liangyu Zhang; Xun Zhang; Maohao Guan; Fengqiang Yu; Fancai Lai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundCell death caused by neutrophil extracellular traps (NETs) is known as NETosis. Despite the increasing importance of NETosis in cancer diagnosis and treatment, its role in Non-Small-Cell Lung Cancer (NSCLC) remains unclear.MethodsA total of 3298 NSCLC patients from different cohorts were included. The AUCell method was used to compute cells’ NETosis scores from single-cell RNA-sequencing data. DEGs in sc-RNA dataset were obtained by the Seurat’s “FindAllMarkers” function, and DEGs in bulk-RNA dataset were acquired by the DESeq2 package. ConsensusClusterPlus package was used to group patients into different NETosis subtypes, and the Enet algorithm was used to construct the NETosis-Related Riskscore (NETRS). Enrichment analyses were conducted using the GSVA and ClusterProfiler packages. Six distinct algorithms were utilized to evaluate patients’ immune cell infiltration level. Patients’ SNV and CNV data were analyzed by maftools and GISTIC2.0, respectively. Drug information was obtained from the GDSC1, and predicted by the Oncopredict package. Patient response to immunotherapy was evaluated by the TIDE algorithm in conjunction with the phs000452 immunotherapy cohort. Six NRGs’ differential expression was verified using qRT-PCR and immunohistochemistry.ResultsAmong all cell types, neutrophils had the highest AUCell score. By Intersecting the DEGs between high and low NETosis classes, DEGs between normal and LUAD tissues, and prognostic related genes, 61 prognostic related NRGs were identified. Based on the 61 NRGs, all LUAD patients can be divided into two clusters, showing different prognostic and TME characteristics. Enet regression identified the NETRS composed of 18 NRGs. NETRS significantly associated with LUAD patients’ clinical characteristics, and patients at different NETRS groups showed significant differences on prognosis, TME characteristics, immune-related molecules’ expression levels, gene mutation frequencies, response to immunotherapy, and drug sensitivity. Besides, NETRS was more powerful than 20 published gene signatures in predicting LUAD patients’ survival. Nine independent cohorts confirmed that NETRS is also valuable in predicting the prognosis of all NSCLC patients. Finally, six NRGs’ expression was confirmed using three independent datasets, qRT-PCR and immunohistochemistry.ConclusionNETRS can serves as a valuable prognostic indicator for patients with NSCLC, providing insights into the tumor microenvironment and predicting the response to cancer therapy.

  6. o

    Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

    • explore.openaire.eu
    • zenodo.org
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANIEL BLANKENBERG; VIJAY NAGAMPALLI (2020). Pan-cancer Aberrant Pathway Activity Analysis (PAPAA) [Dataset]. http://doi.org/10.5281/zenodo.3625200
    Explore at:
    Dataset updated
    Jan 22, 2020
    Authors
    DANIEL BLANKENBERG; VIJAY NAGAMPALLI
    Description

    Information about the dataset files: 1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046] 2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046] 4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/] 6) cosmic_cancer_classification.tsv: Compendium of OG and TSG used for the analysis. Added additional genes from the cosmic database to volgelstein_cancer_classification.tsv [https://github.com/greenelab/pancancer/] 7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt] 8) ccle_rnaseq_genes_rpkm_20180929_mod.tsv.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz] 9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct] 10) GDSC_cell_lines_EXP_CCLE_names.tsv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip] 11) GDSC_CCLE_common_mut_cnv_binary.tsv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv and a list of common cell lines. 12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx] 13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx] 14) compounds_of_interest.txt: list of pharmacological compounds tested for our analysis, taken from ftp://ftp.sanger.ac.uk/pub4/cancerrxgen...

  7. Cross-Study Benchmark Dataset for Monotherapy Drug Response Prediction

    • zenodo.org
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Partin; Alexander Partin (2025). Cross-Study Benchmark Dataset for Monotherapy Drug Response Prediction [Dataset]. http://doi.org/10.5281/zenodo.15258883
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Partin; Alexander Partin
    Description

    This benchmark dataset was created and used to train and evaluate models presented in the paper: A. Partin, P. Vasanthakumari et al., "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis."

    This dataset includes four main components: cell line drug response data, cell line multi-omics data, drug feature data, and predefined data partitions for modeling. Data response data were curated from five pharmacogenomic studies (CCLE, CTRPv2, GDSC1, GDSC2, GCSI), and processed using a unified pipeline for response fitting, omics harmonization, and drug representation.

    Multi-dose viability data were extracted, and a unified dose response fitting pipeline was used to calculate multiple dose-independent response metrics, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50).

    The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as discretizing gene copy numbers and mapping between different gene identifier systems.

    Drug information was retrieved from PubChem. Based on the drug SMILES strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages.

    Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models.

    More detailed information about the dataset and its construction can be found at https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Benjamin Haibe-Kains; Benjamin Haibe-Kains (2021). GDSC1 [Dataset]. http://doi.org/10.5281/zenodo.4730670
Organization logo

GDSC1

Explore at:
binAvailable download formats
Dataset updated
Jun 21, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Haibe-Kains; Benjamin Haibe-Kains
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.4730670

Disclaimer

The GDSC1 data have been generated and shared by the Wellcome Trust Sanger Institute as part of the Genomics of Drug Sensitivity in Cancer (GDSC) Programme. The Haibe-Kains Lab has reprocessed and re-annotated the data to maximize overlap with other pharmacogenomic datasets.

Data Usage Policy

Users have a non-exclusive, non-transferable right to use data files for internal proprietary research and educational purposes, including target, biomarker and drug discovery. Excluded from this licence are use of the data (in whole or any significant part) for resale either alone or in combination with additional data/product offerings, or for provision of commercial services.

Please note: The data files are experimental and academic in nature and are not licensed or certified by any regulatory body. Genome Research Limited provides access to data files on an “as is” basis and excludes all warranties of any kind (express or implied). If you are interested in incorporating results or software into a product, or have questions, please contact depmap@sanger.ac.uk.

Source: https://depmap.sanger.ac.uk/documentation/data-usage-policy/

Sanger's terms and conditions: http://www.cancerrxgene.org/legal

Please cite the following when using these data

Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U, Garnett MJ. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2016 Jul 28;166(3):740-754. doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7. PMID: 27397505; PMCID: PMC4967469.

Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61. doi: 10.1093/nar/gks1111. Epub 2012 Nov 23. PMID: 23180760; PMCID: PMC3531057.

Picco G, Chen ED, Alonso LG, Behan FM, Gonçalves E, Bignell G, Matchan A, Fu B, Banerjee R, Anderson E, Butler A, Benes CH, McDermott U, Dow D, Iorio F, Stronach E, Yang F, Yusa K, Saez-Rodriguez J, Garnett MJ. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nat Commun. 2019 May 16;10(1):2198. doi: 10.1038/s41467-019-09940-1. PMID: 31097696; PMCID: PMC6522557.

Search
Clear search
Close search
Google apps
Main menu