Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 689 cell lines, the RNAseq data includes 1249 cell lines, and the copy number data includes 1682 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.## V2 ChangesCCLE_fusions.csv and CCLE_fusions_unfiltered.csv were swapped in v1, they are correct now.## V3 ChangesUACC62_SKIN_CJ1_RESISTANT has been removed from Public 19Q4 Achilles files due to an issue with fingerprinting. Values for this cell line have been NAed in the following files: Achilles_gene_effect.csv, Achilles_gene_effect_unscaled.csv, Achilles_gene_dependency.csv, Achilles_logfold_change.csv, Achilles_raw_readcounts.csv.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The transcription factor TEAD, together with its coactivator YAP/TAZ, is a key transcriptional modulator of the Hippo pathway. Activation of TEAD transcription by YAP has been implicated in a number of malignancies, and this complex represents a promising target for drug discovery. Here, we employed covalent fragment screening approach followed by structure-based design to develop an irreversible TEAD inhibitor MYF-03-69. Using a range of in vitro and cell-based assays we demonstrated that through a covalent binding with TEAD palmitate pocket, MYF-03-69 disrupts YAP-TEAD association, suppresses TEAD transcriptional activity and inhibits cell growth of Hippo signaling defective malignant pleural mesothelioma (MPM). Further, a cell viability screening with a panel of 903 cancer cell lines indicated a high correlation between TEAD-YAP dependency and the sensitivity to MYF-03-69. To validate MYF-03-69 as potent and selective pan-TEAD inhibitor, we interrogated the proteome-wide selectivity profile of MYF-03-69 on cysteine labeling using a streamlined cysteine activity-based protein profiling (SLC-ABPP) approach and generated the spreadsheet "Supplementary_Dataset_1._Proteome-wide_selectivity_profile_of_MYF-03-69_on_cysteines_labeling_using_SLC-ABPP_approach". We employed the cysteine reactive desthiobiotin iodoacetamide (DBIA) probe which was reported to map more than 8,000 cysteines and performed a competition study on NCI-H226 cells pretreated with 0.5, 2, 10 or 25 µM of MYF-03-69 for 3 hours in triplicate. The cysteines that were conjugated >50% (competition ratio CR>2) compared to DMSO control were analyzed and assigned to the protein targets. In the DMSO control group, although DBIA mapped 12,498 cysteines in total, the TEAD PBP cysteines were not detected. This might be due to low TEAD1-4 protein abundance and/or inability of the PBP cysteines to be labeled given that they are mostly modified by palmitate under physiological conditions. Among 12,498 mapped cysteines, only 7 cysteines were significantly labeled (i.e. exhibited >50% conjugation or CR>2) by 25 µM of MYF-03-69, and all of these sites exhibited dose-dependent engagement. To study the whole transcriptome perturbation by TEAD inhibitor MYF-03-69, mRNA sequencing was performed in NCI-H226 cells that were treated with 0.1 μM, 0.5 μM, and 2 μM of MYF-03-69 and generated the spreadsheet "Supplementary_Dataset_2._List_of_differentially_expressed_genes_under_MYF-03-69_treatments". The genes that were differentially expressed with statistical significance (Fold change > 1.5 and adjusted p value < 0.05) are listed in this dataset. To investigate whether TEAD inhibition by MYF-03-69 was selectively lethal to YAP/TEAD-dependent cancers, PRISM screening across a broad panel of cell lineages were performed and generated the spreadsheet "Supplementary_Dataset_3". 903 cancer cells were treated with TEAD inhibitor MYF-03-69 for 5 days. The viability values were measured at 8-point dose manner (3-fold dilution from 10 μM) and fitted a dose-response curve for each cell line. Area under the curve (AUC) was calculated as a measurement of compound effect on cell viability. CERES score of YAP1 or TEADs from CRISPR (Avana) Public 21Q1 dataset (DepMap) were listed in the spreadsheet and used to estimate gene-dependency. The CERES Score of most dependent TEAD isoform was used to represent TEAD dependency. With PRISM screen dataset of TEAD inhibitor MYF-03-69, we investigated whether TEAD inhibition recapulates genetically knockout outcome of YAP or TEADs and generated the spreadsheet "Supplementary_Dataset_4". Correlation analysis between compound PRISM sensitivity (log2.AUC of each cell line) and dependency of certain gene (CRISPR knockout score for each cell line, from DepMap Public 20Q4 Achilles_gene_effect.csv dataset) across the PRISM cell line panel. The Pearson correlation coefficients and associated p-values were computed. Positive correlations correspond to dependency correlating with increased sensitivity. The q-values (a corrected significance value accounting for false discovery rate) are computed from p-values using the Benjamini Hochberg algorithm. Associations with q-values above 0.1 are filtered out. This correlation analysis reveals that the dependency scores of TEAD1 and YAP1 according to genomic knockout dataset (DepMap portal) provided the highest correlation with the compound PRISM sensitivity profile. This is followed by TP53BP2, a gene that is also involved in Hippo pathway as activator of TAZ. Methods For "Supplementary_Dataset_1._Proteome-wide_selectivity_profile_of_MYF-03-69_on_cysteines_labeling_using_SLC-ABPP_approach", the date was collected on NCI-H226 cells using the same methods reported in reference paper Reimagining high-throughput profiling of reactive cysteines for cell-based screening of large electrophile libraries | Nature Biotechnology. (Kuljanin, M.; Mitchell, D. C.; Schweppe, D. K.; Gikandi, A. S.; Nusinow, D. P.; Bulloch, N. J.; Vinogradova, E. V.; Wilson, D. L.; Kool, E. T.; Mancias, J. D.; Cravatt, B. F.; Gygi, S. P., Reimagining high-throughput profiling of reactive cysteines for cell-based screening of large electrophile libraries. Nature Biotechnology 2021, 39, 630-641) The competition ratio CR was calculated as descibed in the above reference paper. For "Supplementary_Dataset_2._List_of_differentially_expressed_genes_under_MYF-03-69_treatments", the date was collected on NCI-H226 cells treated with MYF-03-69 at indicated concentrations for 6 hours (n=3). The RNA was extracted using RNeasy plus mini kit (Qiagen, cat no.74134) according to the manufacturer instructions. Then libraries were prepared using Roche Kapa mRNA HyperPrep strand specific sample preparation kits from 200 ng of purified total RNA according to the manufacturer’s protocol on a Beckman Coulter Biomek i7. The finished dsDNA libraries were quantified by Qubit fluorometer and Agilent TapeStation 4200. Uniquely dual indexed libraries were pooled in an equimolar ratio and shallowly sequenced on an Illumina MiSeq to further evaluate library quality and pool balance. The final pool was sequenced on an Illumina NovaSeq 6000 targeting 40 million 100bp read pairs per library at the Dana-Farber Cancer Institute Molecular Biology Core Facilities. Sequenced reads were aligned to the UCSC hg19 reference genome assembly and gene counts were quantified using STAR (v2.7.3a). Differential gene expression testing was performed by DESeq2 (v1.22.1). RNAseq analysis was performed using the VIPER snakemake pipeline. KEGG pathway enrichment analysis was performed through metascape webportal. For "Supplementary_Dataset_3", the date was collected using the methods reported in reference paper Discovering the anticancer potential of non-oncology drugs by systematic viability profiling | Nature Cancer. Briefly, up to 931 barcoded cell lines in pools of 20-25 were thawed and plated into 384-well plates (1250 cells/well for adherent cell pools, 2000 cells/well for suspension or mixed suspension/adherent cell pools) containing compound (top concentration: 10 µM, 8-point, threefold dilution). All conditions were tested in triplicate. Cells were lysed after 5 days of treatment and mRNA based Luminex detection of barcode abundance from lysates was carried out as in the reference paper above. Luminex median fluorescence intensity (MFI) data was input to a standardized R pipeline (https://github.com/broadinstitute/prism_data_processing) to generate viability estimates relative to vehicle treatment for each cell line and treatment condition, and to fit dose-response curves from viability data. CERES score of YAP1 or TEADs from CRISPR (Avana) Public 21Q1 dataset (DepMap) were downloaded from DepMap portal (DepMap Data Downloads) and listed with the viability data. For "Supplementary_Dataset_4", the data was correlation analysis results of "Supplementary_Dataset_3", which was performed in the R pipeline mentioned above (https://github.com/broadinstitute/prism_data_processing).
This benchmark data was train and evaluate the models presented in the paper: A. Partin and P. Vasanthakumari et al. "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis"
The benchmark data for Cross-Study Analysis (CSA) include four kinds of data, which are cell line response data, cell line multi-omics data, drug feature data, and data partitions. The figure below illustrates the curation, processing, and assembly of benchmark data, and a unified schema for data curation. Cell line response data were extracted from five sources, including the Cancer Cell Line Encyclopedia (CCLE), the Cancer Therapeutics Response Portal version 2 (CTRPv2), the Genomics of Drug Sensitivity in Cancer version 1 (GDSC1), the Genomics of Drug Sensitivity in Cancer version 2 (GDSC2), and the Genentech Cell Line Screening Initiative (GCSI). These are five large-scale cell line drug screening studies. We extracted their multi-dose viability data and used a unified dose response fitting pipeline to calculate multiple dose-independent response metrics as shown in the figure below, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50). The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as descritizing gene copy numbers and mapping between different gene identifier systems. Drug information was retrived from PubChem. Based on the drug SMILES (Simplified Molecular Input Line Entry Specification) strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages. Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models. The Table below shows the numbers of cell lines, drugs, and experiments in each dataset. Across the five datasets, there are 785 unique cell lines and 749 unique drugs. All cell lines have gene expression, mutation, DNA methylation, and copy number data available. 760 of the cell lines have RPPA protein expressions, and 781 of them have miRNA expressions.
Further description is provided here: https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used in analyses from Krill-Burger et al., “Partial inhibition improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal”. These raw data files include annotations from the Cancer Dependency Map (DepMap) and publicly available datasets that have been pre-processed to have consistent cell line and gene/compound identifiers. See ReadMe.pdf for a detailed description of each file.
This benchmark dataset was created and used to train and evaluate models presented in the paper: A. Partin, P. Vasanthakumari et al., "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis."
This dataset includes four main components: cell line drug response data, cell line multi-omics data, drug feature data, and predefined data partitions for modeling. Data response data were curated from five pharmacogenomic studies (CCLE, CTRPv2, GDSC1, GDSC2, GCSI), and processed using a unified pipeline for response fitting, omics harmonization, and drug representation.
Multi-dose viability data were extracted, and a unified dose response fitting pipeline was used to calculate multiple dose-independent response metrics, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50).
The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as discretizing gene copy numbers and mapping between different gene identifier systems.
Drug information was retrieved from PubChem. Based on the drug SMILES strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages.
Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models.
More detailed information about the dataset and its construction can be found at https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 689 cell lines, the RNAseq data includes 1249 cell lines, and the copy number data includes 1682 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.## V2 ChangesCCLE_fusions.csv and CCLE_fusions_unfiltered.csv were swapped in v1, they are correct now.## V3 ChangesUACC62_SKIN_CJ1_RESISTANT has been removed from Public 19Q4 Achilles files due to an issue with fingerprinting. Values for this cell line have been NAed in the following files: Achilles_gene_effect.csv, Achilles_gene_effect_unscaled.csv, Achilles_gene_dependency.csv, Achilles_logfold_change.csv, Achilles_raw_readcounts.csv.