100+ datasets found

b
Cell Line Database
bioregistry.io
Updated Dec 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Cell Line Database [Dataset]. https://bioregistry.io/cldb
Explore at:
Dataset updated
Dec 28, 2021
Description
The Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.
m
CCLE Cell Line Gene Expression Profiles
maayanlab.cloud
gz
Updated Nov 11, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma'ayan Laboratory of Computational Systems Biology (2010). CCLE Cell Line Gene Expression Profiles [Dataset]. https://maayanlab.cloud/Harmonizome/dataset/CCLE+Cell+Line+Gene+Expression+Profiles
Explore at:
gzAvailable download formats
Dataset updated
Nov 11, 2010
Dataset provided by
Harmonizome
Ma'ayan Laboratory of Computational Systems Biology
Authors
Ma'ayan Laboratory of Computational Systems Biology
Description
mRNA microarray expression profiles for cancer cell lines
Data and metadata supporting the published article: Development and...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Ethier; Stephen T. Guest; Elizabeth Garrett-Mayer; Kent Armeson; Robert C. Wilson; Kathryn Duchinski; Daniel Couch; Joe W. Gray; Chistiana Kappler (2023). Data and metadata supporting the published article: Development and implementation of the SUM breast cancer cell line functional genomics knowledge base. [Dataset]. http://doi.org/10.6084/m9.figshare.12497630.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12497630.v1
Dataset updated
Jun 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stephen Ethier; Stephen T. Guest; Elizabeth Garrett-Mayer; Kent Armeson; Robert C. Wilson; Kathryn Duchinski; Daniel Couch; Joe W. Gray; Chistiana Kappler
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The SUM human breast cancer cell lines have been used by many labs around the world to develop extensive data sets derived from comparative genomic hybridization analysis, gene expression profiling, whole exome sequencing, and reverse phase protein array analysis. In a previous study, the authors of this paper performed genome-scale shRNA essentiality screens on the entire SUM line panel, as well as on MCF10A cells, MCF-7 cells, and MCF-7LTED cells. In this study, the authors have developed the SUM Breast Cancer Cell Line Knowledge Base, to make all of these omics data sets available to users of the SUM lines, and to allow users to mine the data and analyse them with respect to biological pathways enriched by the data in each cell line.Data access: All the datasets supporting the findings of this study are publicly available in the SLKBase platform here: https://sumlineknowledgebase.com/. RPPA data, drug sensitivity data, apelisib response data, and data on dose response, are also part of this figshare data record (https://doi.org/10.6084/m9.figshare.12497630).Study aims and methodology: This web-based knowledge base provides users with data and information on the derivation of each of the cell lines, provides narrative summaries of the genomics and cell biology of each breast cancer cell line, and provides protocols for the proper maintenance of the cells. The database includes a series of data mining tools that allow rapid identification of the functional oncogene signatures for each line, the enrichment of any KEGG pathway with screen hit and gene expression data for each of the lines, and a rapid analysis of protein and phospho-protein expression for the cell lines. A gene search tool that returns all of the functional genome and functional druggable data for any gene for the entire cell line panel, is included. Additionally, the authors have expanded the database to include functional genomic data for an additional 29 commonly used breast cancer cell lines. The three overarching goals in the original development of the SLKBase are: 1) to provide a rich source of information for anyone working with any of the SUM breast cancer cell lines, 2) to give researchers ready access to the large genomic data sets that have been developed with these cells, and 3) to allow researchers to perform orthogonal analyses of the various genomics data sets that we and others have obtained from the SUM lines. For more information on the development and contents of the database, please read the related article.Datasets supporting the paper:The data mining tools accessed the following datasets to generate the figures and tables, and these datasets are downloadable from the Data Download centre on the SLKBase: Exome sequencing data: SLKBase.exome_.seq_.sum_.xlsxGene amplification and expression data for the SUM cell lines: SUM44amplificationdata.xlsSUM52.xlsSUM149.xlsSUM159.xlsSUM185.xlsSUM190.xlsSUM225.xlsSUM229.xlsSUM1315.xlsCellecta shRNA screen data for the SUM cell lines:SUM44Celectadata.csvSUM52Cellectadata.csvSUM102Cellectadata.csvSUM149Cellectadata.csvSUM159Cellectadata.csvSUM185Cellectadata.csvSUM190Cellectadata.csvSUM225Cellectadata.csvSUM229Cellectadata.csvSUM1315hits.hit.csvMCF10A.hits_.csvBreast cancer cell line data included in this data record (these datasets were used to generate figures 1, 2 and 7 in the article):Proteomics data from the Reverse Phase Protein Array (RPPA) assay analysis: Ethier.SUMline.RPPA.xlsxDrug sensitivity data: NAVITOCLAX.drugsensitivity.Zscores.xlsxApelisib response data: Apelisib all lines (2).xlsxDose response data: 092614 Dose Response CP 52s.11.15.xlsxAll the files are either in .xlsx or .csv file format.
r
Cancer Cell Line Encyclopedia
rrid.site
Updated Aug 21, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). Cancer Cell Line Encyclopedia [Dataset]. http://identifiers.org/RRID:SCR_013836
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013836
Dataset updated
Aug 21, 2010
Description
A collaborative project between the Broad Institute and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation, with the goal of conducting a detailed genetic and pharmacologic characterization of a large panel of human cancer models. The CCLE also works to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.
M
RNA sequencing data for 30 bladder cancer cell lines
datacatalog.mskcc.org
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee, I-Ling; McConkey, David J.; Su, Xiaoping; Choi, Woonyoung (2019). RNA sequencing data for 30 bladder cancer cell lines [Dataset]. https://datacatalog.mskcc.org/dataset/10401
Explore at:
Dataset updated
Nov 18, 2019
Authors
Lee, I-Ling; McConkey, David J.; Su, Xiaoping; Choi, Woonyoung
Description
Summary from the GEO: "RNA-sequencing of a panel of urothelial cancer cells. The goal of the study is to examine the genome-wide expression profile in each of the 30 urothelial cancer cells tested in our laboratory."

"Overall design: Each of the 30 cell lines was DNA fingerprinted to confirm its real identity. Total RNA was obtained from each cell line and subjected to Illumina RNA sequencing."

The data was from a study on comprehensive molecular characterization of muscle-invasive bladder cancer.
b
NCI-60 Cancer Cell Lines
bigomics.ch
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI) (2024). NCI-60 Cancer Cell Lines [Dataset]. https://bigomics.ch/blog/top-databases-for-drug-discovery/
Explore at:
Dataset updated
Nov 8, 2024
Dataset authored and provided by
National Cancer Institute (NCI)
Description
A panel of 60 human cancer cell lines used for screening anticancer drugs.
s
Cellosaurus
scicrunch.org
neuinfo.org
+2more
Updated May 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Cellosaurus [Dataset]. http://identifiers.org/RRID:SCR_013869
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013869
Dataset updated
May 6, 2015
Description
Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
m
CCLE Cell Line Gene Mutation Profiles
maayanlab.cloud
gz
Updated Nov 11, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma'ayan Laboratory of Computational Systems Biology (2010). CCLE Cell Line Gene Mutation Profiles [Dataset]. https://maayanlab.cloud/Harmonizome/dataset/CCLE+Cell+Line+Gene+Mutation+Profiles
Explore at:
gzAvailable download formats
Dataset updated
Nov 11, 2010
Dataset provided by
Harmonizome
Ma'ayan Laboratory of Computational Systems Biology
Authors
Ma'ayan Laboratory of Computational Systems Biology
Description
Gene-level mutation profiles for cancer cell lines
Cancer Cell Line Encyclopedia
kaggle.com
zip
Updated Jul 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Fernandez (2019). Cancer Cell Line Encyclopedia [Dataset]. https://www.kaggle.com/cornhundred/cancer-cell-line-encyclopedia
Explore at:
zip(171199832 bytes)Available download formats
Dataset updated
Jul 12, 2019
Authors
Nicolas Fernandez
Description
Context

This data was obtained from the Broad Institute Cancer Cell Line Encyclopedia https://portals.broadinstitute.org/ccle/data.

Content

Bulk gene expression data from over 1,000 cancer cell lines was processed to include several cell metadata fields (processing scripts will be included shortly).

Acknowledgements

All data was produced at the Broad Institute. Pleas see: https://portals.broadinstitute.org/ccle/data
b
Insect Cell Line Database
bioregistry.io
Updated Jul 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Insect Cell Line Database [Dataset]. https://bioregistry.io/icldb
Explore at:
Dataset updated
Jul 2, 2023
Description
We have developed an online database describing the known cell lines from Coleoptera, Diptera, Hemiptera, Hymenoptera, and Lepidoptera that originated from crop pest insects. Cell line information has been primarily obtained from previous compilations of insect cell lines. (from homepage)
e
SNP array data from the Cancer Cell Line Encyclopedia (CCLE)
ebi.ac.uk
Updated Mar 19, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Stransky (2012). SNP array data from the Cancer Cell Line Encyclopedia (CCLE) [Dataset]. https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-GEOD-36138
Explore at:
Dataset updated
Mar 19, 2012
Authors
Nicolas Stransky
Description
The Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, the Novartis Institutes for Biomedical Research and the Genomics Novartis Foundation to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models It consists of a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from nearly 1,000 human cancer cell lines. All raw and processed data are available through an integrated portal on www.broadinstitute.org/ccle The final cell line collection spans 36 cancer types. Representation of cell lines for each cancer type was mainly driven by cancer mortality in the United States, as a surrogate of unmet medical need, as well as availability.
n
Integrated Cell Lines
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Integrated Cell Lines [Dataset]. http://identifiers.org/RRID:SCR_008994
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008994
Dataset updated
Jan 29, 2022
Description
A virtual database currently indexing available cell lines from: Coriell Cell Repositories, International Mouse Strain Resource (IMSR), ATCC, NIH Human Pluripotent Stem Cell Registry, NIGMS Human Genetic Cell Repository, and Developmental Therapeutics Program.
HTLV-1, HHV-8, and SMRV specific read numbers of cell lines ordered by CCLE...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cord C. Uphoff; Claudia Pommerenke; Sabine A. Denkmann; Hans G. Drexler (2023). HTLV-1, HHV-8, and SMRV specific read numbers of cell lines ordered by CCLE file names. [Dataset]. http://doi.org/10.1371/journal.pone.0210404.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0210404.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Cord C. Uphoff; Claudia Pommerenke; Sabine A. Denkmann; Hans G. Drexler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HTLV-1, HHV-8, and SMRV specific read numbers of cell lines ordered by CCLE file names.
Z
Transcriptomics data for CCLE, NCI-60, and PDAC mouse data
data.niaid.nih.gov
nde-dev.biothings.io
+1more
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernett, Judith (2024). Transcriptomics data for CCLE, NCI-60, and PDAC mouse data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10785228
Explore at:
Dataset updated
Mar 6, 2024
Dataset provided by
Technische Universität München
Authors
Bernett, Judith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Counts, lengths, TPM, and FPKM per gene and per transcript. All CCLE and NCI-60 cell lines are specified as Cellosaurus IDs.

Everything was re-processed with the nf-core/rnaseq pipeline (version 3.10.1) in the setting STAR/Salmon. For human fastq files (CCLE, NCI-60), GRCh38 was used, for mouse GRCm39.

CCLE (1019 cell lines)

Raw fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA523380 using the SRA toolkit. Sequences were first prefetched and then the fastq files were generated with:

#!/bin/bash while read run do echo $run fasterq-dump $run echo gzipping gzip $run*.fastq done < SRR_Acc_List_CCLE.txt

Then, the directories were deleted.

Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using this command:

nextflow run nf-core/rnaseq --input CCLE_samplesheet.csv --outdir CCLE/nf_core/ --multiqc_title CCLE_star_salmon -c CCLE_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1

For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs.

The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments).

NCI-60 (60 cell lines)

Like for NCI-60, fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA433861 using the SRA toolkit.

Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using the same command settings as above:

nextflow run nf-core/rnaseq --input NCI60_samplesheet.csv --outdir NCI60/nf_core/ --multiqc_title NCI60_star_salmon -c NCI60_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1 ``` For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs. The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments). PDAC mouse data (401 samples) This data was generated by the MRI (university hospital rechts der Isar, Munich). The data generation strategy is described in PMC6097607. The read_1 samples contain all the cDNA while the read_2 samples only contain UMIs. Hence, only read_1 samples were used. The FASTQ files were processed using the nf-core/RNA-seq pipeline using this command: ```{bash} nextflow run nf-core/rnaseq --input MRI_PDAC_samplesheet.csv --outdir MRI_PDAC/nf_core/ --multiqc_title MRI_star_salmon -c MRI_PDAC_nextflow.config -profile singularity,slurm --fasta ensembl110_GRCm39/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa.gz --gtf ensembl110_GRCm39/Mus_musculus.GRCm39.110.gtf.gz -r 3.10.1 ``` The metadata file contains information about the experiments and the oncogenes, genotypes and morphology (epithelial/mesenchymal/fibroblast contamination).
DepMap 21Q1 Public
figshare.com
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Broad DepMap (2023). DepMap 21Q1 Public [Dataset]. http://doi.org/10.6084/m9.figshare.13681534.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13681534.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Broad DepMap
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name
NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67...
datacatalog.mskcc.org
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (U.S.) (2020). NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67 samples by NCI-60 cell line project [Dataset]. https://datacatalog.mskcc.org/dataset/10453
Explore at:
Dataset updated
May 20, 2020
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
MSK Library
Description
This dataset contains summary data visualizations and clinical data from 67 samples from 67 patients as part an NCI-60 cell line project to compile NCI-60 cell line high-throughput and high-content data into CellMiner, a genomic and pharmacologic database created by the National Cancer Center Institute. The clinical data includes deidentified patient and sample IDs, mutation counts, detailed cancer type information, patient demographics, and past modality. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
m
AtT-20 cell line expression data
figshare.mq.edu.au
researchdata.edu.au
xlsx
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marina Junqueira Santiago; Mark Connor (2022). AtT-20 cell line expression data [Dataset]. http://doi.org/10.25949/21529404.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25949/21529404.v1
Dataset updated
Nov 10, 2022
Dataset provided by
Macquarie University
Authors
Marina Junqueira Santiago; Mark Connor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of transcript sequencing for AtT-20FlpIn cells. mRNA was isolated from AtT-20FlpIn cells using standard procedures, next generation sequencing was performed by Macrogen (https://dna.macrogen.com/). A report ourtlining the workflow and data analysis methods is available from the Authors by request.

Deposited data is in an Excel file, which includes the gene symbol, transcript ID from the reference mouse genome, protein ID and transcript abundance. The AtT-20FlpIn cells were generated by Dr Santiago, and have been used as the 'wild type' cells for generating cell lines stably expressing GPCR and ion channels for most of the molecular pharmacology projects in the Molecular Pharmacodynamics group.
Viral genome reference used for detection of viral sequences in CCLE RNA-Seq...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cord C. Uphoff; Claudia Pommerenke; Sabine A. Denkmann; Hans G. Drexler (2023). Viral genome reference used for detection of viral sequences in CCLE RNA-Seq and WES datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0210404.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0210404.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Cord C. Uphoff; Claudia Pommerenke; Sabine A. Denkmann; Hans G. Drexler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Viral genome reference used for detection of viral sequences in CCLE RNA-Seq and WES datasets.
m
CCLE Cell Line Gene CNV Profiles
maayanlab.cloud
gz
Updated Nov 11, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma'ayan Laboratory of Computational Systems Biology (2010). CCLE Cell Line Gene CNV Profiles [Dataset]. https://maayanlab.cloud/Harmonizome/dataset/CCLE+Cell+Line+Gene+CNV+Profiles
Explore at:
gzAvailable download formats
Dataset updated
Nov 11, 2010
Dataset provided by
Harmonizome
Ma'ayan Laboratory of Computational Systems Biology
Authors
Ma'ayan Laboratory of Computational Systems Biology
Description
Gene-level copy number variation profiles for cancer cell lines
Genomics of Drug Sensitivity in Cancer (GDSC)
kaggle.com
zip
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samira Alipour (2024). Genomics of Drug Sensitivity in Cancer (GDSC) [Dataset]. https://www.kaggle.com/datasets/samiraalipour/genomics-of-drug-sensitivity-in-cancer-gdsc/discussion
Explore at:
zip(15094344 bytes)Available download formats
Dataset updated
Aug 13, 2024
Authors
Samira Alipour
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
The Genomics of Drug Sensitivity in Cancer (GDSC) dataset is a valuable resource for therapeutic biomarker discovery in cancer research. This dataset combines drug response data with genomic profiles of cancer cell lines, allowing researchers to investigate the relationship between genetic features and drug sensitivity.

Task:

The primary task associated with this dataset is to predict drug sensitivity (measured as IC50 values) based on genomic features of cancer cell lines. This can involve regression tasks to predict exact IC50 values or classification tasks to categorize cell lines as sensitive or resistant to specific drugs. The dataset also allows for the identification of genomic markers that correlate with drug response.

Files:

GDSC2-dataset.csv: Contains drug sensitivity data, including IC50 values, for various drugs tested against cancer cell lines.(Original source file)

Cell_Lines_Details.xlsx: Provides detailed information about the cancer cell lines, including genomic features such as mutations, copy number alterations, and gene expression. (Original source file)

Compounds-annotation.csv: Offers information about the drugs used in the screening, including their targets and pathways. (Original source file)

GDSC_DATASET.csv: This is the main dataset file for analysis. It's a merged file combining key information from the above three files, created to facilitate easier analysis. This consolidated dataset includes all necessary features for drug sensitivity prediction and is recommended for use in your analysis.

Detailed Column Descriptions:

1. GDSC2-dataset.csv:

DATASET: Identifier for the specific GDSC dataset version.

NLME_RESULT_ID: Unique identifier for the non-linear mixed effects model result.

NLME_CURVE_ID: Identifier for the dose-response curve fitted by NLME.

COSMIC_ID: Unique identifier for the cell line from the COSMIC database.

CELL_LINE_NAME: Name of the cancer cell line used in the experiment.

SANGER_MODEL_ID: Identifier used by the Sanger Institute for the cell line model.

TCGA_DESC: Description of the cancer type according to The Cancer Genome Atlas.

DRUG_ID: Unique identifier for the drug used in the experiment.

DRUG_NAME: Name of the drug used in the experiment.

PUTATIVE_TARGET: The presumed molecular target of the drug.

PATHWAY_NAME: The biological pathway affected by the drug.

COMPANY_ID: Identifier for the company that provided the drug.

WEBRELEASE: Date or version of web release for this data.

MIN_CONC: Minimum concentration of the drug used in the experiment.

MAX_CONC: Maximum concentration of the drug used in the experiment.

LN_IC50: Natural log of the half-maximal inhibitory concentration (IC50).

AUC: Area Under the Curve, a measure of drug effectiveness.

RMSE: Root Mean Square Error, indicating the fit quality of the dose-response curve.

Z_SCORE: Standardized score of the drug response, allowing comparison across different drugs and cell lines. ### 2. Cell_Lines_Details.xlsx:

Sample Name: Unique identifier for the cell line sample.

COSMIC identifier: Unique ID from the COSMIC database for the cell line.

Whole Exome Sequencing (WES): Genetic mutation data from whole exome sequencing.

Copy Number Alterations (CNA): Data on gene copy number changes in the cell line.

Gene Expression: Information on gene expression levels in the cell line.

Methylation: Data on DNA methylation patterns in the cell line.

Drug Response: Information on how the cell line responds to various drugs.

GDSC Tissue descriptor 1: Primary tissue type classification.

GDSC Tissue descriptor 2: Secondary tissue type classification.

Cancer Type (matching TCGA label): Cancer type according to TCGA classification.

Microsatellite instability Status (MSI): Indicates the cell line's MSI status.

Screen Medium: The growth medium used for culturing the cell line.

Growth Properties: Characteristics of how the cell line grows in culture. ### 3. Compounds-annotation.csv:

DRUG_ID: Unique identifier for the drug.

SCREENING_SITE: Location where the drug screening was performed.

DRUG_NAME: Name of the drug compound.

SYNONYMS: Alternative names for the drug.

TARGET: The molecular target(s) of the drug.

TARGET_PATHWAY: The biological pathway(s) targeted by the drug.

Target Variable:

The primary target variable in this dataset is LN_IC50 (Natural log of the half-maximal inhibitory concentration). This variable represents the concentration of a drug that inhibits cell viability by 50%, measured on a logarithmic scale. Lower LN_IC50 values indicate higher drug sensitivity, making it a crucial metric for evaluating the effectiveness of anti-ca...

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). Cell Line Database [Dataset]. https://bioregistry.io/cldb

Cell Line Database

Explore at:

Dataset updated

Dec 28, 2021

Description

The Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.

Clear search

Close search

Google apps

Main menu

Cell Line Database

CCLE Cell Line Gene Expression Profiles

Data and metadata supporting the published article: Development and...

Cancer Cell Line Encyclopedia

RNA sequencing data for 30 bladder cancer cell lines

NCI-60 Cancer Cell Lines

Cellosaurus

CCLE Cell Line Gene Mutation Profiles

Cancer Cell Line Encyclopedia

Context

Content

Acknowledgements

Insect Cell Line Database

SNP array data from the Cancer Cell Line Encyclopedia (CCLE)

Integrated Cell Lines

HTLV-1, HHV-8, and SMRV specific read numbers of cell lines ordered by CCLE...

Transcriptomics data for CCLE, NCI-60, and PDAC mouse data

DepMap 21Q1 Public

NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67...

AtT-20 cell line expression data

Viral genome reference used for detection of viral sequences in CCLE RNA-Seq...

CCLE Cell Line Gene CNV Profiles

Genomics of Drug Sensitivity in Cancer (GDSC)

Task:

Files:

Detailed Column Descriptions:

1. GDSC2-dataset.csv:

Target Variable:

Cell Line Database