CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.4730670
Disclaimer
The GDSC1 data have been generated and shared by the Wellcome Trust Sanger Institute as part of the Genomics of Drug Sensitivity in Cancer (GDSC) Programme. The Haibe-Kains Lab has reprocessed and re-annotated the data to maximize overlap with other pharmacogenomic datasets.
Data Usage Policy
Users have a non-exclusive, non-transferable right to use data files for internal proprietary research and educational purposes, including target, biomarker and drug discovery. Excluded from this licence are use of the data (in whole or any significant part) for resale either alone or in combination with additional data/product offerings, or for provision of commercial services.
Please note: The data files are experimental and academic in nature and are not licensed or certified by any regulatory body. Genome Research Limited provides access to data files on an “as is” basis and excludes all warranties of any kind (express or implied). If you are interested in incorporating results or software into a product, or have questions, please contact depmap@sanger.ac.uk.
Source: https://depmap.sanger.ac.uk/documentation/data-usage-policy/
Sanger's terms and conditions: http://www.cancerrxgene.org/legal
Please cite the following when using these data
Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U, Garnett MJ. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2016 Jul 28;166(3):740-754. doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7. PMID: 27397505; PMCID: PMC4967469.
Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61. doi: 10.1093/nar/gks1111. Epub 2012 Nov 23. PMID: 23180760; PMCID: PMC3531057.
Picco G, Chen ED, Alonso LG, Behan FM, Gonçalves E, Bignell G, Matchan A, Fu B, Banerjee R, Anderson E, Butler A, Benes CH, McDermott U, Dow D, Iorio F, Stronach E, Yang F, Yusa K, Saez-Rodriguez J, Garnett MJ. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nat Commun. 2019 May 16;10(1):2198. doi: 10.1038/s41467-019-09940-1. PMID: 31097696; PMCID: PMC6522557.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PharmacoSet (PSet) for GDSC1 dataset
Metadata for this PSet can be found at: http://orcestra.ca/10.5281/zenodo.3601389
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.3880056
This benchmark data was train and evaluate the models presented in the paper: A. Partin and P. Vasanthakumari et al. "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis"
The benchmark data for Cross-Study Analysis (CSA) include four kinds of data, which are cell line response data, cell line multi-omics data, drug feature data, and data partitions. The figure below illustrates the curation, processing, and assembly of benchmark data, and a unified schema for data curation. Cell line response data were extracted from five sources, including the Cancer Cell Line Encyclopedia (CCLE), the Cancer Therapeutics Response Portal version 2 (CTRPv2), the Genomics of Drug Sensitivity in Cancer version 1 (GDSC1), the Genomics of Drug Sensitivity in Cancer version 2 (GDSC2), and the Genentech Cell Line Screening Initiative (GCSI). These are five large-scale cell line drug screening studies. We extracted their multi-dose viability data and used a unified dose response fitting pipeline to calculate multiple dose-independent response metrics as shown in the figure below, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50). The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as descritizing gene copy numbers and mapping between different gene identifier systems. Drug information was retrived from PubChem. Based on the drug SMILES (Simplified Molecular Input Line Entry Specification) strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages. Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models. The Table below shows the numbers of cell lines, drugs, and experiments in each dataset. Across the five datasets, there are 785 unique cell lines and 749 unique drugs. All cell lines have gene expression, mutation, DNA methylation, and copy number data available. 760 of the cell lines have RPPA protein expressions, and 781 of them have miRNA expressions.
Further description is provided here: https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCell death caused by neutrophil extracellular traps (NETs) is known as NETosis. Despite the increasing importance of NETosis in cancer diagnosis and treatment, its role in Non-Small-Cell Lung Cancer (NSCLC) remains unclear.MethodsA total of 3298 NSCLC patients from different cohorts were included. The AUCell method was used to compute cells’ NETosis scores from single-cell RNA-sequencing data. DEGs in sc-RNA dataset were obtained by the Seurat’s “FindAllMarkers” function, and DEGs in bulk-RNA dataset were acquired by the DESeq2 package. ConsensusClusterPlus package was used to group patients into different NETosis subtypes, and the Enet algorithm was used to construct the NETosis-Related Riskscore (NETRS). Enrichment analyses were conducted using the GSVA and ClusterProfiler packages. Six distinct algorithms were utilized to evaluate patients’ immune cell infiltration level. Patients’ SNV and CNV data were analyzed by maftools and GISTIC2.0, respectively. Drug information was obtained from the GDSC1, and predicted by the Oncopredict package. Patient response to immunotherapy was evaluated by the TIDE algorithm in conjunction with the phs000452 immunotherapy cohort. Six NRGs’ differential expression was verified using qRT-PCR and immunohistochemistry.ResultsAmong all cell types, neutrophils had the highest AUCell score. By Intersecting the DEGs between high and low NETosis classes, DEGs between normal and LUAD tissues, and prognostic related genes, 61 prognostic related NRGs were identified. Based on the 61 NRGs, all LUAD patients can be divided into two clusters, showing different prognostic and TME characteristics. Enet regression identified the NETRS composed of 18 NRGs. NETRS significantly associated with LUAD patients’ clinical characteristics, and patients at different NETRS groups showed significant differences on prognosis, TME characteristics, immune-related molecules’ expression levels, gene mutation frequencies, response to immunotherapy, and drug sensitivity. Besides, NETRS was more powerful than 20 published gene signatures in predicting LUAD patients’ survival. Nine independent cohorts confirmed that NETRS is also valuable in predicting the prognosis of all NSCLC patients. Finally, six NRGs’ expression was confirmed using three independent datasets, qRT-PCR and immunohistochemistry.ConclusionNETRS can serves as a valuable prognostic indicator for patients with NSCLC, providing insights into the tumor microenvironment and predicting the response to cancer therapy.
Information about the dataset files: 1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046] 2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046] 4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046] 5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/] 6) cosmic_cancer_classification.tsv: Compendium of OG and TSG used for the analysis. Added additional genes from the cosmic database to volgelstein_cancer_classification.tsv [https://github.com/greenelab/pancancer/] 7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt] 8) ccle_rnaseq_genes_rpkm_20180929_mod.tsv.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz] 9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct] 10) GDSC_cell_lines_EXP_CCLE_names.tsv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip] 11) GDSC_CCLE_common_mut_cnv_binary.tsv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv and a list of common cell lines. 12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx] 13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx] 14) compounds_of_interest.txt: list of pharmacological compounds tested for our analysis, taken from ftp://ftp.sanger.ac.uk/pub4/cancerrxgen...
This benchmark dataset was created and used to train and evaluate models presented in the paper: A. Partin, P. Vasanthakumari et al., "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis."
This dataset includes four main components: cell line drug response data, cell line multi-omics data, drug feature data, and predefined data partitions for modeling. Data response data were curated from five pharmacogenomic studies (CCLE, CTRPv2, GDSC1, GDSC2, GCSI), and processed using a unified pipeline for response fitting, omics harmonization, and drug representation.
Multi-dose viability data were extracted, and a unified dose response fitting pipeline was used to calculate multiple dose-independent response metrics, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50).
The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as discretizing gene copy numbers and mapping between different gene identifier systems.
Drug information was retrieved from PubChem. Based on the drug SMILES strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages.
Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models.
More detailed information about the dataset and its construction can be found at https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
GDSC1 PharmacoSet (PSet) generated by ORCESTRA. Metadata can be found on ORCESTRA at: http://orcestra.ca/10.5281/zenodo.4730670
Disclaimer
The GDSC1 data have been generated and shared by the Wellcome Trust Sanger Institute as part of the Genomics of Drug Sensitivity in Cancer (GDSC) Programme. The Haibe-Kains Lab has reprocessed and re-annotated the data to maximize overlap with other pharmacogenomic datasets.
Data Usage Policy
Users have a non-exclusive, non-transferable right to use data files for internal proprietary research and educational purposes, including target, biomarker and drug discovery. Excluded from this licence are use of the data (in whole or any significant part) for resale either alone or in combination with additional data/product offerings, or for provision of commercial services.
Please note: The data files are experimental and academic in nature and are not licensed or certified by any regulatory body. Genome Research Limited provides access to data files on an “as is” basis and excludes all warranties of any kind (express or implied). If you are interested in incorporating results or software into a product, or have questions, please contact depmap@sanger.ac.uk.
Source: https://depmap.sanger.ac.uk/documentation/data-usage-policy/
Sanger's terms and conditions: http://www.cancerrxgene.org/legal
Please cite the following when using these data
Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, Saez-Rodriguez J, McDermott U, Garnett MJ. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2016 Jul 28;166(3):740-754. doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7. PMID: 27397505; PMCID: PMC4967469.
Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61. doi: 10.1093/nar/gks1111. Epub 2012 Nov 23. PMID: 23180760; PMCID: PMC3531057.
Picco G, Chen ED, Alonso LG, Behan FM, Gonçalves E, Bignell G, Matchan A, Fu B, Banerjee R, Anderson E, Butler A, Benes CH, McDermott U, Dow D, Iorio F, Stronach E, Yang F, Yusa K, Saez-Rodriguez J, Garnett MJ. Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR-Cas9 screening. Nat Commun. 2019 May 16;10(1):2198. doi: 10.1038/s41467-019-09940-1. PMID: 31097696; PMCID: PMC6522557.