10 datasets found
  1. DepMap 21Q1 Public

    • figshare.com
    txt
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2023). DepMap 21Q1 Public [Dataset]. http://doi.org/10.6084/m9.figshare.13681534.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name

  2. cross-dataset-drp-paper

    • zenodo.org
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Partin; A. Partin (2025). cross-dataset-drp-paper [Dataset]. http://doi.org/10.5281/zenodo.15258451
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    A. Partin; A. Partin
    Description

    This benchmark data was train and evaluate the models presented in the paper: A. Partin and P. Vasanthakumari et al. "Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis"

    The benchmark data for Cross-Study Analysis (CSA) include four kinds of data, which are cell line response data, cell line multi-omics data, drug feature data, and data partitions. The figure below illustrates the curation, processing, and assembly of benchmark data, and a unified schema for data curation. Cell line response data were extracted from five sources, including the Cancer Cell Line Encyclopedia (CCLE), the Cancer Therapeutics Response Portal version 2 (CTRPv2), the Genomics of Drug Sensitivity in Cancer version 1 (GDSC1), the Genomics of Drug Sensitivity in Cancer version 2 (GDSC2), and the Genentech Cell Line Screening Initiative (GCSI). These are five large-scale cell line drug screening studies. We extracted their multi-dose viability data and used a unified dose response fitting pipeline to calculate multiple dose-independent response metrics as shown in the figure below, such as the area under the dose response curve (AUC) and the half-maximal inhibitory concentration (IC50). The multi-omics data of cell lines were extracted from the the Dependency Map (DepMap) portal of CCLE, including gene expressions, DNA mutations, DNA methylation, gene copy numbers, protein expressions measured by reverse phase protein array (RPPA), and miRNA expressions. Data preprocessing was performed, such as descritizing gene copy numbers and mapping between different gene identifier systems. Drug information was retrived from PubChem. Based on the drug SMILES (Simplified Molecular Input Line Entry Specification) strings, we calculated their molecular fingerprints and descriptors using the Mordred and RDKit Python packages. Data partition files were generated using the IMPROVE benchmark data preparation pipeline. They indicate, for each modeling analysis run, which samples should be included in the training, validation, and testing sets, for building and evaluating the drug response prediction (DRP) models. The Table below shows the numbers of cell lines, drugs, and experiments in each dataset. Across the five datasets, there are 785 unique cell lines and 749 unique drugs. All cell lines have gene expression, mutation, DNA methylation, and copy number data available. 760 of the cell lines have RPPA protein expressions, and 781 of them have miRNA expressions.

    Further description is provided here: https://jdacs4c-improve.github.io/docs/content/app_drp_benchmark.html

  3. Celligner data

    • figshare.com
    bin
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cancer Data Science (2020). Celligner data [Dataset]. http://doi.org/10.6084/m9.figshare.11965269.v5
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 7, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Cancer Data Science
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We developed a computational method (Celligner) that identifies and removes systematic differences between cell lines and tumor gene expression profiles, allowing for direct integration of existing large-scale cancer cell line and tumor datasets. Celligner performs this computational alignment across cancer types in a completely unsupervised fashion, without relying on prior annotations of cancer types, tumor sample purity, or contaminating cell expression profiles. We applied Celligner to produce a global alignment of 12,236 tumor samples from TCGA, TARGET, and Treehouse datasets and 1,249 cell lines from DepMap. This dataset includes Celligner-aligned data, a matrix of correlations between cell lines and tumors, associated cell line and tumor metadata, and other outputs from the Celligner method. See Readme file for more details about the dataset contents and version history.

  4. f

    DataSheet_1_Epigenetic and Immune-Cell Infiltration Changes in the Tumor...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu, Jia; Wu, Zeng-Hong; Yang, Dong-Liang; Wang, Liang (2021). DataSheet_1_Epigenetic and Immune-Cell Infiltration Changes in the Tumor Microenvironment in Hepatocellular Carcinoma.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000736372
    Explore at:
    Dataset updated
    Dec 2, 2021
    Authors
    Liu, Jia; Wu, Zeng-Hong; Yang, Dong-Liang; Wang, Liang
    Description

    BackgroundEpigenetics regulate gene expression without altering the DNA sequence. Epigenetics targeted chemotherapeutic approach can be used to overcome treatment resistance and low response rate in HCC. However, a comprehensive review of genomic data was carried out to determine the role of epigenesis in the tumor microenvironment (TME), immune cell-infiltration characteristics in HCC is still insufficient.MethodsThe association between epigenetic-related genes (ERGs), inflammatory response-related genes (IRRGs) and CRISPR genes was determined by merging genomic and CRISPR data. Further, characteristics of immune-cell infiltration in the tumor microenvironment was evaluated.ResultsNine differentially expressed genes (ANP32B, ASF1A, BCORL1, BMI1, BUB1, CBX2, CBX3, CDK1, and CDK5) were shown to be independent prognostic factors based on lasso regression in the TCGA-LIHC and ICGC databases. In addition, the results showed significant differences in expression of PDCD-1 (PD-1) and CTLA4 between the high- and low-epigenetic score groups. The CTRP and PRISM-derived drug response data yielded four CTRP-derived compounds (SB-743921, GSK461364, gemcitabine, and paclitaxel) and two PRISM-derived compounds (dolastatin-10 and LY2606368). Patients with high ERGs benefited more from immune checkpoint inhibitor (ICI) therapy than patients with low ERGs. In addition, the high ERGs subgroup had a higher T cell exclusion score, while the low ERGs subgroup had a higher T cell dysfunction. However, there was no difference in microsatellite instability (MSI) score among the two subgroups. Further, genome-wide CRISPR-based loss-of function screening derived from DepMap was conducted to determine key genes leading to HCC development and progression. In total, 640 genes were identified to be essential for survival in HCC cell lines. The protein-protein interaction (PPI) network demonstrated that IRRGs PSEN1 was linked to most ERGs and CRISPR genes such as CDK1, TOP2A, CBX2 and CBX3.ConclusionEpigenetic alterations of cancer-related genes in the tumor microenvironment play a major role in carcinogenesis. This study showed that epigenetic-related novel biomarkers could be useful in predicting prognosis, clinical diagnosis, and management in HCC.

  5. f

    DataSheet1_Multi-Omics Analysis of Cancer Cell Lines with High/Low...

    • frontiersin.figshare.com
    txt
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guangyao Shan; Huan Zhang; Guoshu Bi; Yunyi Bian; Jiaqi Liang; Besskaya Valeria; Dejun Zeng; Guangyu Yao; Cheng Zhan; Hong Fan (2023). DataSheet1_Multi-Omics Analysis of Cancer Cell Lines with High/Low Ferroptosis Scores and Development of a Ferroptosis-Related Model for Multiple Cancer Types.CSV [Dataset]. http://doi.org/10.3389/fcell.2021.794475.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Guangyao Shan; Huan Zhang; Guoshu Bi; Yunyi Bian; Jiaqi Liang; Besskaya Valeria; Dejun Zeng; Guangyu Yao; Cheng Zhan; Hong Fan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Ferroptosis is a newly identified regulated cell death characterized by iron-dependent lipid peroxidation and subsequent membrane oxidative damage, which has been implicated in multiple types of cancers. The multi-omics differences between cancer cell lines with high/low ferroptosis scores remain to be elucidated.Methods and Materials: We used RNA-seq gene expression, gene mutation, miRNA expression, metabolites, copy number variation, and drug sensitivity data of cancer cell lines from DEPMAP to detect multi-omics differences associated with ferroptosis. Based on the gene expression data of cancer cell lines, we performed LASSO-Logistic regression analysis to build a ferroptosis-related model. Lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), esophageal cancer (ESCA), bladder cancer (BLCA), cervical cancer (CESC), and head and neck cancer (HNSC) patients from the TCGA database were used as validation cohorts to test the efficacy of this model.Results: After stratifying the cancer cell lines into high score (HS) and low score (LS) groups according to the median of ferroptosis scores generated by gene set variation analysis, we found that IC50 of 66 agents such as oxaliplatin (p < 0.001) were significantly different, among which 65 were higher in the HS group. 851 genes such as KEAP1 and NRAS were differentially muted between the two groups. Differentially expressed genes, miRNAs and metabolites were also detected—multiple items such as IL17F (logFC = 6.58, p < 0.001) differed between the two groups. Unlike the TCGA data generated by bulk RNA-seq, the gene expression data in DEPMAP are from pure cancer cells, so it could better reflect the traits of tumors in cancer patients. Thus, we built a 15-signature model (AUC = 0.878) based on the gene expression data of cancer cell lines. The validation cohorts demonstrated a higher mutational rate of NFE2L2 and higher expression levels of 12 ferroptosis-related genes in HS groups.Conclusion: This article systemically analyzed multi-omics differences between cancer cell lines with high/low ferroptosis scores and a ferroptosis-related model was developed for multiple cancer types. Our findings could improve our understanding of the role of ferroptosis in cancer and provide new insight into treatment for malignant tumors.

  6. DEMETER2 data

    • figshare.com
    txt
    Updated Apr 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cancer Data Science (2020). DEMETER2 data [Dataset]. http://doi.org/10.6084/m9.figshare.6025238.v6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 9, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Cancer Data Science
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer cell line genetic dependencies estimated using the DEMETER2 model. DEMETER2 is applied to three large-scale RNAi screening datasets: the Broad Institute Project Achilles, Novartis Project DRIVE, and the Marcotte et al. breast cell line dataset. The model is also applied to generate a combined dataset of gene dependencies covering a total of 712 unique cancer cell lines. For more information visit https://depmap.org/R2-D2/. Visit the Cancer Dependency Map portal at https://depmap.org to explore related datasets. Email questions to depmap@broadinstitute.org This dataset includes gene dependencies estimated using the DEMETER2 model, the raw input datasets used to fit the models, as well as associated metadata. See Readme file for more details about the dataset contents and version history.-------------------------------------------------------------------Version history: (see README for more details)-------------------------------------------------------------------v1: Initial data releasev2: - Removed small number of non-human genes (e.g. GFP, RFP) from shRNA-to-gene mapping - Updated cell line names to be consistent with DepMap names, according to the following map (old -> new):v3: Added estimated seed effect matricesv4: Added RNAseq and mutation data files used in analysis for manuscriptv5: Fixed minor bug with Marcotte LFC data that caused hairpins targeting multiple genes to appear multiple times in the LFC matrix. This created bias in the seed effect estimates for those hairpins, causing very minor differences to the resulting model parameters.v6: Added tables with shRNA quality metrics for Achilles and DRIVE data

  7. Additional file 17 of Mapping in silico genetic networks of the KMT2D tumour...

    • springernature.figshare.com
    xlsx
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuka Takemon; Erin D. Pleasance; Alessia Gagliardi; Christopher S. Hughes; Veronika Csizmok; Kathleen Wee; Diane L. Trinh; Ryan D. Huff; Andrew J. Mungall; Richard A. Moore; Eric Chuah; Karen L. Mungall; Eleanor Lewis; Jessica Nelson; Howard J. Lim; Daniel J. Renouf; Steven JM. Jones; Janessa Laskin; Marco A. Marra (2024). Additional file 17 of Mapping in silico genetic networks of the KMT2D tumour suppressor gene to uncover novel functional associations and cancer cell vulnerabilities [Dataset]. http://doi.org/10.6084/m9.figshare.27894380.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 23, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yuka Takemon; Erin D. Pleasance; Alessia Gagliardi; Christopher S. Hughes; Veronika Csizmok; Kathleen Wee; Diane L. Trinh; Ryan D. Huff; Andrew J. Mungall; Richard A. Moore; Eric Chuah; Karen L. Mungall; Eleanor Lewis; Jessica Nelson; Howard J. Lim; Daniel J. Renouf; Steven JM. Jones; Janessa Laskin; Marco A. Marra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 17: Table S12. Characterising [AT]n motif expansion in cancer cell lines. A. All MSI DepMap cancer cell lines with WGS data and three MSS DepMap cell lines with WGS data for comparison. Cell_line_stripped_name: cell line names provided by DepMap without special characters or spaces. disease: cancer type assigned by DepMap. MSI_status: inferred microsatellite status by Ghandi et al. (2019). KMT2D_group: KMT2D group assigned to a cell line. SRA_project_ID: NCBI SRA project ID for raw WGS data. SRA_Run_ID: NCBI SRA run ID for raw WGS data. Instrument: sequencer used to perform WGS. B. Profiles of [AT]n motifs generated using ExpansionHunter Denovo. Cell_line_name: cell line names provided by DepMap without special characters or spaces. disease: cancer type assigned by DepMap. SRA_id: NCBI SRAN run ID. group: MSI status and KMT2D group of cell line. Contig: chromosome location of motif. Start: start site (bp) of motif. End: end site (bp) of motif. Motif: motif type. Num_anc_irrs: number of anchored in-repeat reads (i.e. one read pair maps within and the other outside of a repeat region; see Methods). norm_num_anc_irrs: normalised number of anchored in-repeat reads. Het_str_size: estimated motif repeat size.

  8. Project SCORE processed with CERES

    • figshare.com
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2023). Project SCORE processed with CERES [Dataset]. http://doi.org/10.6084/m9.figshare.9116732.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the result of 318 cancer cell lines screened with the genome-wide KY1.0/1.1 CRISPR KO library by the Sanger Institute, processed with the Achilles pipeline (except QC). The publication describing the experiment is "Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens," DOI 10.1038/s41586-019-1103-9.Readcounts were downloaded from https://score.depmap.sanger.ac.uk/downloads on 8 May 2019. Only cell lines annotated by the authors as passing both QC steps in Supplementary Table 1 were retained. Additionally, only cell lines for which the Broad has copy number data as of 10 May 2019 were retained.For more details on included files, see README

  9. Table_2_A Novel Prognostic Signature Based on Glioma Essential...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debo Yun; Xuya Wang; Wenbo Wang; Xiao Ren; Jiabo Li; Xisen Wang; Jianshen Liang; Jie Liu; Jikang Fan; Xiude Ren; Hao Zhang; Guanjie Shang; Jingzhang Sun; Lei Chen; Tao Li; Chen Zhang; Shengping Yu; Xuejun Yang (2023). Table_2_A Novel Prognostic Signature Based on Glioma Essential Ferroptosis-Related Genes Predicts Clinical Outcomes and Indicates Treatment in Glioma.xlsx [Dataset]. http://doi.org/10.3389/fonc.2022.897702.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Debo Yun; Xuya Wang; Wenbo Wang; Xiao Ren; Jiabo Li; Xisen Wang; Jianshen Liang; Jie Liu; Jikang Fan; Xiude Ren; Hao Zhang; Guanjie Shang; Jingzhang Sun; Lei Chen; Tao Li; Chen Zhang; Shengping Yu; Xuejun Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundFerroptosis is a form of programmed cell death (PCD) that has been implicated in cancer progression, although the specific mechanism is not known. Here, we used the latest DepMap release CRISPR data to identify the essential ferroptosis-related genes (FRGs) in glioma and their role in patient outcomes.MethodsRNA-seq and clinical information on glioma cases were obtained from the Chinese Glioma Genome Atlas (CGGA) and The Cancer Genome Atlas (TCGA). FRGs were obtained from the FerrDb database. CRISPR-screened essential genes (CSEGs) in glioma cell lines were downloaded from the DepMap portal. A series of bioinformatic and machine learning approaches were combined to establish FRG signatures to predict overall survival (OS) in glioma patients. In addition, pathways analysis was used to identify the functional roles of FRGs. Somatic mutation, immune cell infiltration, and immune checkpoint gene expression were analyzed within the risk subgroups. Finally, compounds for reversing high-risk gene signatures were predicted using the GDSC and L1000 datasets.ResultsSeven FRGs (ISCU, NFS1, MTOR, EIF2S1, HSPA5, AURKA, RPL8) were included in the model and the model was found to have good prognostic value (p < 0.001) in both training and validation groups. The risk score was found to be an independent prognostic factor and the model had good efficacy. Subgroup analysis using clinical parameters demonstrated the general applicability of the model. The nomogram indicated that the model could effectively predict 12-, 36-, and 60-months OS and progression-free interval (PFI). The results showed the presence of more aggressive phenotypes (lower numbers of IDH mutations, higher numbers of EGFR and PTEN mutations, greater infiltration of immune suppressive cells, and higher expression of immune checkpoint inhibitors) in the high-risk group. The signaling pathways enriched closely related to the cell cycle and DNA damage repair. Drug predictions showed that patients with higher risk scores may benefit from treatment with RTK pathway inhibitors, including compounds that inhibit RTKs directly or indirectly by targeting downstream PI3K or MAPK pathways.ConclusionIn summary, the proposed cancer essential FRG signature predicts survival and treatment response in glioma.

  10. Expression vs genomics for predicting dependencies

    • figshare.com
    hdf
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2024). Expression vs genomics for predicting dependencies [Dataset]. http://doi.org/10.6084/m9.figshare.25843450.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the "Gene expression has more power for predicting in vitro cancer cell vulnerabilities than genomics" preprint by Dempster et al. To generate the figure panels seen in the preprint using these data, use FigurePanelGeneration.ipynb. This study includes five datasets (citations and details in manuscript).Achilles: the Broad Institute's DepMap public 19Q4 CRISPR knockout screens processed with CERESScore: The Sanger Wellcome Institute's Project Score CRISPR knockout screens processed with CERESRNAi: The DEMETER2-processed combined dataset which includes RNAi data from Achilles, DRIVE, and Marcotte breast screens.PRISM: The PRISM pooled in vitro repurposing primary screen of compoundsGDSC17: Cancer drug in vitro drug screens performed by SangerThe files of most interest to a biologist are Summary.csv. If you are interested in trying machine learning, the files Features.hdf5 and Target.hdf5 contain the data munged in a convenient form for standard supervised machine learning algorithms.Some large files are in the binary format hdf5 for efficiency in space and read-in. These files each contain three named hdf5 datasets. "dim_0" holds the row/index names as an array of strings, "dim_1" holds the column names as an array of strings, and "data" holds the matrix contents as a 2D array of floats. In python, these files can be read in with: import pandas as pd import h5py def read_hdf5(filename): src = h5py.File(filename, 'r') try: dim_0 = [x.decode('utf8') for x in src['dim_0']] dim_1 = [x.decode('utf8') for x in src['dim_1']] data = np.array(src['data']) return pd.DataFrame(index=dim_0, columns=dim_1, data=data) finally: src.close()##################################################################Files (not every dataset will have every type of file listed below):##################################################################AllFeaturePredictions.hdf5: Matrix of cell lines by perturbations, with values indicating the predicted viability using a model with all feature types.ENAdditionScore.csv: A matrix of perturbations by number of features. Values indicate an elastic net model performance (Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5) using only the top X features, where X is the column header.FeatureDropScore.csv: Perturbations and predictive performance for a model using all single gene expression features EXCEPT those that had greater than 0.1 feature importance in a model trained with all single gene expression features. Features.hdf5: A very large matrix of all cell lines by all used CCLE cell features. Continuous features were zscored. Cell lines missing mutation or expression data were dropped. Remaining NA values were imputed to zero. Features types are indicated by the column matrix suffixes: _Exp: expression _Hot: hotspot mutation _Dam: damaging mutation _OtherMut: other mutation _CN: copy number _GSEA: ssGSEA score for an MSigDB gene set _MethTSS: Methylation of transcription start sites _MethCpG: Methylation of CpG islands _Fusion: Gene fusions _Cell: cell tissue propertiesNormLRT.csv: the normLRT score for the given perturbationRFAdditionScore.csv: similar to ENAdditionScore, but using a random forest model.Summary.csv: A dataframe containing predictive model results. Columns: model: Specifies the collection of features used (Expression, Mutation, Exp+CN, etc) gene: The perturbation (column in Target.hdf5) examined. Actually a compound for the PRISM and GDSC17 datasets. overall_pearson: Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5 feature: the Nth most important feature, found by retraining the model with all cell lines (N = 0-9) feature_importance: the feature importance as assessed by sklearn's RandomForestRegressorTarget.hdf5: A matrix of cell lines by perturbations, with entries indicating post-perturbation viability scores. Note that the scales of the viability effects are different for different datasets. See manuscript methods for details.PerturbationInfo.csv: Additional drug annotations for the PRISM and GDSC17 datasetsApproximateCFE.hdf5: A set of Cancer Functional Event cell features based on CCLE data, adapted from Iorio et al. 2016 (10.1016/j.cell.2016.06.017)DepMapSampleInfo.csv: sample info from DepMap_public_19Q4 data, reproduced here as a convenience.GeneRelationships.csv: A list of genes and their related (partner) genes, with the type of relationship (self, protein-protein interaction, CORUM complex membership, paralog). OncoKB_oncogenes.csv: A list of genes that have non-expression-based alterations listed as likely oncogenic or oncogenic by OncoKB as of 9 May 2018.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Broad DepMap (2023). DepMap 21Q1 Public [Dataset]. http://doi.org/10.6084/m9.figshare.13681534.v2
Organization logoOrganization logo

DepMap 21Q1 Public

Explore at:
17 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Broad DepMap
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name

Search
Clear search
Close search
Google apps
Main menu