Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name
Facebook
TwitterThis benchmark data was train and evaluate the models presented in the paper: A. Partin and P. Vasanthakumari et al. "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis"
The benchmark data for Cross-Study Analysis (CSA) include four kinds of data, which are cell line response data, cell line multi-omics data, drug feature data, and data partitions. The figure below illustrates the curation, processing, and assembly of benchmark data, and a unified schema for data curation. Cell line response data were extracted from five sources, including the Cancer Cell Line Encyclopedia (CCLE), the Cancer Therapeutics Response Portal version 2 (CTRPv2), the Genomics of Drug Sensitivity in Cancer version 1 (GDSC1), the Genomics of Drug Sensitivity in Cancer version 2 (GDSC2), and the Genentech Cell Line Screening Initiative (GCSI). These are five large-scale cell line drug screening studies. We extracted their multi-dose viability data and used a unified dose response fitting pipeline to calculate multiple dose-independent response metrics as shown in the figure below, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50). The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as descritizing gene copy numbers and mapping between different gene identifier systems. Drug information was retrived from PubChem. Based on the drug SMILES (Simplified Molecular Input Line Entry Specification) strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages. Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models. The Table below shows the numbers of cell lines, drugs, and experiments in each dataset. Across the five datasets, there are 785 unique cell lines and 749 unique drugs. All cell lines have gene expression, mutation, DNA methylation, and copy number data available. 760 of the cell lines have RPPA protein expressions, and 781 of them have miRNA expressions.
Further description is provided here: https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We developed a computational method (Celligner) that identifies and removes systematic differences between cell lines and tumor gene expression profiles, allowing for direct integration of existing large-scale cancer cell line and tumor datasets. Celligner performs this computational alignment across cancer types in a completely unsupervised fashion, without relying on prior annotations of cancer types, tumor sample purity, or contaminating cell expression profiles. We applied Celligner to produce a global alignment of 12,236 tumor samples from TCGA, TARGET, and Treehouse datasets and 1,249 cell lines from DepMap. This dataset includes Celligner-aligned data, a matrix of correlations between cell lines and tumors, associated cell line and tumor metadata, and other outputs from the Celligner method. See Readme file for more details about the dataset contents and version history.
Facebook
TwitterBackgroundEpigenetics regulate gene expression without altering the DNA sequence. Epigenetics targeted chemotherapeutic approach can be used to overcome treatment resistance and low response rate in HCC. However, a comprehensive review of genomic data was carried out to determine the role of epigenesis in the tumor microenvironment (TME), immune cell-infiltration characteristics in HCC is still insufficient.MethodsThe association between epigenetic-related genes (ERGs), inflammatory response-related genes (IRRGs) and CRISPR genes was determined by merging genomic and CRISPR data. Further, characteristics of immune-cell infiltration in the tumor microenvironment was evaluated.ResultsNine differentially expressed genes (ANP32B, ASF1A, BCORL1, BMI1, BUB1, CBX2, CBX3, CDK1, and CDK5) were shown to be independent prognostic factors based on lasso regression in the TCGA-LIHC and ICGC databases. In addition, the results showed significant differences in expression of PDCD-1 (PD-1) and CTLA4 between the high- and low-epigenetic score groups. The CTRP and PRISM-derived drug response data yielded four CTRP-derived compounds (SB-743921, GSK461364, gemcitabine, and paclitaxel) and two PRISM-derived compounds (dolastatin-10 and LY2606368). Patients with high ERGs benefited more from immune checkpoint inhibitor (ICI) therapy than patients with low ERGs. In addition, the high ERGs subgroup had a higher T cell exclusion score, while the low ERGs subgroup had a higher T cell dysfunction. However, there was no difference in microsatellite instability (MSI) score among the two subgroups. Further, genome-wide CRISPR-based loss-of function screening derived from DepMap was conducted to determine key genes leading to HCC development and progression. In total, 640 genes were identified to be essential for survival in HCC cell lines. The protein-protein interaction (PPI) network demonstrated that IRRGs PSEN1 was linked to most ERGs and CRISPR genes such as CDK1, TOP2A, CBX2 and CBX3.ConclusionEpigenetic alterations of cancer-related genes in the tumor microenvironment play a major role in carcinogenesis. This study showed that epigenetic-related novel biomarkers could be useful in predicting prognosis, clinical diagnosis, and management in HCC.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Ferroptosis is a newly identified regulated cell death characterized by iron-dependent lipid peroxidation and subsequent membrane oxidative damage, which has been implicated in multiple types of cancers. The multi-omics differences between cancer cell lines with high/low ferroptosis scores remain to be elucidated.Methods and Materials: We used RNA-seq gene expression, gene mutation, miRNA expression, metabolites, copy number variation, and drug sensitivity data of cancer cell lines from DEPMAP to detect multi-omics differences associated with ferroptosis. Based on the gene expression data of cancer cell lines, we performed LASSO-Logistic regression analysis to build a ferroptosis-related model. Lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), esophageal cancer (ESCA), bladder cancer (BLCA), cervical cancer (CESC), and head and neck cancer (HNSC) patients from the TCGA database were used as validation cohorts to test the efficacy of this model.Results: After stratifying the cancer cell lines into high score (HS) and low score (LS) groups according to the median of ferroptosis scores generated by gene set variation analysis, we found that IC50 of 66 agents such as oxaliplatin (p < 0.001) were significantly different, among which 65 were higher in the HS group. 851 genes such as KEAP1 and NRAS were differentially muted between the two groups. Differentially expressed genes, miRNAs and metabolites were also detected—multiple items such as IL17F (logFC = 6.58, p < 0.001) differed between the two groups. Unlike the TCGA data generated by bulk RNA-seq, the gene expression data in DEPMAP are from pure cancer cells, so it could better reflect the traits of tumors in cancer patients. Thus, we built a 15-signature model (AUC = 0.878) based on the gene expression data of cancer cell lines. The validation cohorts demonstrated a higher mutational rate of NFE2L2 and higher expression levels of 12 ferroptosis-related genes in HS groups.Conclusion: This article systemically analyzed multi-omics differences between cancer cell lines with high/low ferroptosis scores and a ferroptosis-related model was developed for multiple cancer types. Our findings could improve our understanding of the role of ferroptosis in cancer and provide new insight into treatment for malignant tumors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer cell line genetic dependencies estimated using the DEMETER2 model. DEMETER2 is applied to three large-scale RNAi screening datasets: the Broad Institute Project Achilles, Novartis Project DRIVE, and the Marcotte et al. breast cell line dataset. The model is also applied to generate a combined dataset of gene dependencies covering a total of 712 unique cancer cell lines. For more information visit https://depmap.org/R2-D2/. Visit the Cancer Dependency Map portal at https://depmap.org to explore related datasets. Email questions to depmap@broadinstitute.org This dataset includes gene dependencies estimated using the DEMETER2 model, the raw input datasets used to fit the models, as well as associated metadata. See Readme file for more details about the dataset contents and version history.-------------------------------------------------------------------Version history: (see README for more details)-------------------------------------------------------------------v1: Initial data releasev2: - Removed small number of non-human genes (e.g. GFP, RFP) from shRNA-to-gene mapping - Updated cell line names to be consistent with DepMap names, according to the following map (old -> new):v3: Added estimated seed effect matricesv4: Added RNAseq and mutation data files used in analysis for manuscriptv5: Fixed minor bug with Marcotte LFC data that caused hairpins targeting multiple genes to appear multiple times in the LFC matrix. This created bias in the seed effect estimates for those hairpins, causing very minor differences to the resulting model parameters.v6: Added tables with shRNA quality metrics for Achilles and DRIVE data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 17: Table S12. Characterising [AT]n motif expansion in cancer cell lines. A. All MSI DepMap cancer cell lines with WGS data and three MSS DepMap cell lines with WGS data for comparison. Cell_line_stripped_name: cell line names provided by DepMap without special characters or spaces. disease: cancer type assigned by DepMap. MSI_status: inferred microsatellite status by Ghandi et al. (2019). KMT2D_group: KMT2D group assigned to a cell line. SRA_project_ID: NCBI SRA project ID for raw WGS data. SRA_Run_ID: NCBI SRA run ID for raw WGS data. Instrument: sequencer used to perform WGS. B. Profiles of [AT]n motifs generated using ExpansionHunter Denovo. Cell_line_name: cell line names provided by DepMap without special characters or spaces. disease: cancer type assigned by DepMap. SRA_id: NCBI SRAN run ID. group: MSI status and KMT2D group of cell line. Contig: chromosome location of motif. Start: start site (bp) of motif. End: end site (bp) of motif. Motif: motif type. Num_anc_irrs: number of anchored in-repeat reads (i.e. one read pair maps within and the other outside of a repeat region; see Methods). norm_num_anc_irrs: normalised number of anchored in-repeat reads. Het_str_size: estimated motif repeat size.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is the result of 318 cancer cell lines screened with the genome-wide KY1.0/1.1 CRISPR KO library by the Sanger Institute, processed with the Achilles pipeline (except QC). The publication describing the experiment is "Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens," DOI 10.1038/s41586-019-1103-9.Readcounts were downloaded from https://score.depmap.sanger.ac.uk/downloads on 8 May 2019. Only cell lines annotated by the authors as passing both QC steps in Supplementary Table 1 were retained. Additionally, only cell lines for which the Broad has copy number data as of 10 May 2019 were retained.For more details on included files, see README
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundFerroptosis is a form of programmed cell death (PCD) that has been implicated in cancer progression, although the specific mechanism is not known. Here, we used the latest DepMap release CRISPR data to identify the essential ferroptosis-related genes (FRGs) in glioma and their role in patient outcomes.MethodsRNA-seq and clinical information on glioma cases were obtained from the Chinese Glioma Genome Atlas (CGGA) and The Cancer Genome Atlas (TCGA). FRGs were obtained from the FerrDb database. CRISPR-screened essential genes (CSEGs) in glioma cell lines were downloaded from the DepMap portal. A series of bioinformatic and machine learning approaches were combined to establish FRG signatures to predict overall survival (OS) in glioma patients. In addition, pathways analysis was used to identify the functional roles of FRGs. Somatic mutation, immune cell infiltration, and immune checkpoint gene expression were analyzed within the risk subgroups. Finally, compounds for reversing high-risk gene signatures were predicted using the GDSC and L1000 datasets.ResultsSeven FRGs (ISCU, NFS1, MTOR, EIF2S1, HSPA5, AURKA, RPL8) were included in the model and the model was found to have good prognostic value (p < 0.001) in both training and validation groups. The risk score was found to be an independent prognostic factor and the model had good efficacy. Subgroup analysis using clinical parameters demonstrated the general applicability of the model. The nomogram indicated that the model could effectively predict 12-, 36-, and 60-months OS and progression-free interval (PFI). The results showed the presence of more aggressive phenotypes (lower numbers of IDH mutations, higher numbers of EGFR and PTEN mutations, greater infiltration of immune suppressive cells, and higher expression of immune checkpoint inhibitors) in the high-risk group. The signaling pathways enriched closely related to the cell cycle and DNA damage repair. Drug predictions showed that patients with higher risk scores may benefit from treatment with RTK pathway inhibitors, including compounds that inhibit RTKs directly or indirectly by targeting downstream PI3K or MAPK pathways.ConclusionIn summary, the proposed cancer essential FRG signature predicts survival and treatment response in glioma.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the "Gene expression has more power for predicting in vitro cancer cell vulnerabilities than genomics" preprint by Dempster et al. To generate the figure panels seen in the preprint using these data, use FigurePanelGeneration.ipynb. This study includes five datasets (citations and details in manuscript).Achilles: the Broad Institute's DepMap public 19Q4 CRISPR knockout screens processed with CERESScore: The Sanger Wellcome Institute's Project Score CRISPR knockout screens processed with CERESRNAi: The DEMETER2-processed combined dataset which includes RNAi data from Achilles, DRIVE, and Marcotte breast screens.PRISM: The PRISM pooled in vitro repurposing primary screen of compoundsGDSC17: Cancer drug in vitro drug screens performed by SangerThe files of most interest to a biologist are Summary.csv. If you are interested in trying machine learning, the files Features.hdf5 and Target.hdf5 contain the data munged in a convenient form for standard supervised machine learning algorithms.Some large files are in the binary format hdf5 for efficiency in space and read-in. These files each contain three named hdf5 datasets. "dim_0" holds the row/index names as an array of strings, "dim_1" holds the column names as an array of strings, and "data" holds the matrix contents as a 2D array of floats. In python, these files can be read in with: import pandas as pd import h5py def read_hdf5(filename): src = h5py.File(filename, 'r') try: dim_0 = [x.decode('utf8') for x in src['dim_0']] dim_1 = [x.decode('utf8') for x in src['dim_1']] data = np.array(src['data']) return pd.DataFrame(index=dim_0, columns=dim_1, data=data) finally: src.close()##################################################################Files (not every dataset will have every type of file listed below):##################################################################AllFeaturePredictions.hdf5: Matrix of cell lines by perturbations, with values indicating the predicted viability using a model with all feature types.ENAdditionScore.csv: A matrix of perturbations by number of features. Values indicate an elastic net model performance (Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5) using only the top X features, where X is the column header.FeatureDropScore.csv: Perturbations and predictive performance for a model using all single gene expression features EXCEPT those that had greater than 0.1 feature importance in a model trained with all single gene expression features. Features.hdf5: A very large matrix of all cell lines by all used CCLE cell features. Continuous features were zscored. Cell lines missing mutation or expression data were dropped. Remaining NA values were imputed to zero. Features types are indicated by the column matrix suffixes: _Exp: expression _Hot: hotspot mutation _Dam: damaging mutation _OtherMut: other mutation _CN: copy number _GSEA: ssGSEA score for an MSigDB gene set _MethTSS: Methylation of transcription start sites _MethCpG: Methylation of CpG islands _Fusion: Gene fusions _Cell: cell tissue propertiesNormLRT.csv: the normLRT score for the given perturbationRFAdditionScore.csv: similar to ENAdditionScore, but using a random forest model.Summary.csv: A dataframe containing predictive model results. Columns: model: Specifies the collection of features used (Expression, Mutation, Exp+CN, etc) gene: The perturbation (column in Target.hdf5) examined. Actually a compound for the PRISM and GDSC17 datasets. overall_pearson: Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5 feature: the Nth most important feature, found by retraining the model with all cell lines (N = 0-9) feature_importance: the feature importance as assessed by sklearn's RandomForestRegressorTarget.hdf5: A matrix of cell lines by perturbations, with entries indicating post-perturbation viability scores. Note that the scales of the viability effects are different for different datasets. See manuscript methods for details.PerturbationInfo.csv: Additional drug annotations for the PRISM and GDSC17 datasetsApproximateCFE.hdf5: A set of Cancer Functional Event cell features based on CCLE data, adapted from Iorio et al. 2016 (10.1016/j.cell.2016.06.017)DepMapSampleInfo.csv: sample info from DepMap_public_19Q4 data, reproduced here as a convenience.GeneRelationships.csv: A list of genes and their related (partner) genes, with the type of relationship (self, protein-protein interaction, CORUM complex membership, paralog). OncoKB_oncogenes.csv: A list of genes that have non-expression-based alterations listed as likely oncogenic or oncogenic by OncoKB as of 9 May 2018.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name