Facebook
TwitterThe Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.
Facebook
TwittermRNA microarray expression profiles for cancer cell lines
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SUM human breast cancer cell lines have been used by many labs around the world to develop extensive data sets derived from comparative genomic hybridization analysis, gene expression profiling, whole exome sequencing, and reverse phase protein array analysis. In a previous study, the authors of this paper performed genome-scale shRNA essentiality screens on the entire SUM line panel, as well as on MCF10A cells, MCF-7 cells, and MCF-7LTED cells. In this study, the authors have developed the SUM Breast Cancer Cell Line Knowledge Base, to make all of these omics data sets available to users of the SUM lines, and to allow users to mine the data and analyse them with respect to biological pathways enriched by the data in each cell line.Data access: All the datasets supporting the findings of this study are publicly available in the SLKBase platform here: https://sumlineknowledgebase.com/. RPPA data, drug sensitivity data, apelisib response data, and data on dose response, are also part of this figshare data record (https://doi.org/10.6084/m9.figshare.12497630).Study aims and methodology: This web-based knowledge base provides users with data and information on the derivation of each of the cell lines, provides narrative summaries of the genomics and cell biology of each breast cancer cell line, and provides protocols for the proper maintenance of the cells. The database includes a series of data mining tools that allow rapid identification of the functional oncogene signatures for each line, the enrichment of any KEGG pathway with screen hit and gene expression data for each of the lines, and a rapid analysis of protein and phospho-protein expression for the cell lines. A gene search tool that returns all of the functional genome and functional druggable data for any gene for the entire cell line panel, is included. Additionally, the authors have expanded the database to include functional genomic data for an additional 29 commonly used breast cancer cell lines. The three overarching goals in the original development of the SLKBase are: 1) to provide a rich source of information for anyone working with any of the SUM breast cancer cell lines, 2) to give researchers ready access to the large genomic data sets that have been developed with these cells, and 3) to allow researchers to perform orthogonal analyses of the various genomics data sets that we and others have obtained from the SUM lines. For more information on the development and contents of the database, please read the related article.Datasets supporting the paper:The data mining tools accessed the following datasets to generate the figures and tables, and these datasets are downloadable from the Data Download centre on the SLKBase: Exome sequencing data: SLKBase.exome_.seq_.sum_.xlsxGene amplification and expression data for the SUM cell lines: SUM44amplificationdata.xlsSUM52.xlsSUM149.xlsSUM159.xlsSUM185.xlsSUM190.xlsSUM225.xlsSUM229.xlsSUM1315.xlsCellecta shRNA screen data for the SUM cell lines:SUM44Celectadata.csvSUM52Cellectadata.csvSUM102Cellectadata.csvSUM149Cellectadata.csvSUM159Cellectadata.csvSUM185Cellectadata.csvSUM190Cellectadata.csvSUM225Cellectadata.csvSUM229Cellectadata.csvSUM1315hits.hit.csvMCF10A.hits_.csvBreast cancer cell line data included in this data record (these datasets were used to generate figures 1, 2 and 7 in the article):Proteomics data from the Reverse Phase Protein Array (RPPA) assay analysis: Ethier.SUMline.RPPA.xlsxDrug sensitivity data: NAVITOCLAX.drugsensitivity.Zscores.xlsxApelisib response data: Apelisib all lines (2).xlsxDose response data: 092614 Dose Response CP 52s.11.15.xlsxAll the files are either in .xlsx or .csv file format.
Facebook
TwitterA collaborative project between the Broad Institute and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation, with the goal of conducting a detailed genetic and pharmacologic characterization of a large panel of human cancer models. The CCLE also works to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.
Facebook
TwitterSummary from the GEO: "RNA-sequencing of a panel of urothelial cancer cells. The goal of the study is to examine the genome-wide expression profile in each of the 30 urothelial cancer cells tested in our laboratory."
"Overall design: Each of the 30 cell lines was DNA fingerprinted to confirm its real identity. Total RNA was obtained from each cell line and subjected to Illumina RNA sequencing."
The data was from a study on comprehensive molecular characterization of muscle-invasive bladder cancer.
Facebook
TwitterA panel of 60 human cancer cell lines used for screening anticancer drugs.
Facebook
TwitterDatabase of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
Facebook
TwitterGene-level mutation profiles for cancer cell lines
Facebook
TwitterThis data was obtained from the Broad Institute Cancer Cell Line Encyclopedia https://portals.broadinstitute.org/ccle/data.
Bulk gene expression data from over 1,000 cancer cell lines was processed to include several cell metadata fields (processing scripts will be included shortly).
All data was produced at the Broad Institute. Pleas see: https://portals.broadinstitute.org/ccle/data
Facebook
TwitterWe have developed an online database describing the known cell lines from Coleoptera, Diptera, Hemiptera, Hymenoptera, and Lepidoptera that originated from crop pest insects. Cell line information has been primarily obtained from previous compilations of insect cell lines. (from homepage)
Facebook
TwitterThe Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, the Novartis Institutes for Biomedical Research and the Genomics Novartis Foundation to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models It consists of a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from nearly 1,000 human cancer cell lines. All raw and processed data are available through an integrated portal on www.broadinstitute.org/ccle The final cell line collection spans 36 cancer types. Representation of cell lines for each cancer type was mainly driven by cancer mortality in the United States, as a surrogate of unmet medical need, as well as availability.
Facebook
TwitterA virtual database currently indexing available cell lines from: Coriell Cell Repositories, International Mouse Strain Resource (IMSR), ATCC, NIH Human Pluripotent Stem Cell Registry, NIGMS Human Genetic Cell Repository, and Developmental Therapeutics Program.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HTLV-1, HHV-8, and SMRV specific read numbers of cell lines ordered by CCLE file names.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Counts, lengths, TPM, and FPKM per gene and per transcript. All CCLE and NCI-60 cell lines are specified as Cellosaurus IDs.
Everything was re-processed with the nf-core/rnaseq pipeline (version 3.10.1) in the setting STAR/Salmon. For human fastq files (CCLE, NCI-60), GRCh38 was used, for mouse GRCm39.
CCLE (1019 cell lines)
Raw fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA523380 using the SRA toolkit. Sequences were first prefetched and then the fastq files were generated with:
#!/bin/bash
while read run
do
echo $run
fasterq-dump $run
echo gzipping
gzip $run*.fastq
done < SRR_Acc_List_CCLE.txt
Then, the directories were deleted.
Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using this command:
nextflow run nf-core/rnaseq --input CCLE_samplesheet.csv --outdir CCLE/nf_core/ --multiqc_title CCLE_star_salmon -c CCLE_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1
For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs.
The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments).
NCI-60 (60 cell lines)
Like for NCI-60, fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA433861 using the SRA toolkit.
Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using the same command settings as above:
nextflow run nf-core/rnaseq --input NCI60_samplesheet.csv --outdir NCI60/nf_core/ --multiqc_title NCI60_star_salmon -c NCI60_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1
```
For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs.
The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments).
PDAC mouse data (401 samples)
This data was generated by the MRI (university hospital rechts der Isar, Munich). The data generation strategy is described in PMC6097607. The read_1 samples contain all the cDNA while the read_2 samples only contain UMIs. Hence, only read_1 samples were used.
The FASTQ files were processed using the nf-core/RNA-seq pipeline using this command:
```{bash}
nextflow run nf-core/rnaseq --input MRI_PDAC_samplesheet.csv --outdir MRI_PDAC/nf_core/ --multiqc_title MRI_star_salmon -c MRI_PDAC_nextflow.config -profile singularity,slurm --fasta ensembl110_GRCm39/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa.gz --gtf ensembl110_GRCm39/Mus_musculus.GRCm39.110.gtf.gz -r 3.10.1
```
The metadata file contains information about the experiments and the oncogenes, genotypes and morphology (epithelial/mesenchymal/fibroblast contamination).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap.The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 808 cell lines, the RNAseq data includes 1376 cell lines, and the copy number data includes 1740 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.v2: changed dataset name
Facebook
TwitterThis dataset contains summary data visualizations and clinical data from 67 samples from 67 patients as part an NCI-60 cell line project to compile NCI-60 cell line high-throughput and high-content data into CellMiner, a genomic and pharmacologic database created by the National Cancer Center Institute. The clinical data includes deidentified patient and sample IDs, mutation counts, detailed cancer type information, patient demographics, and past modality. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of transcript sequencing for AtT-20FlpIn cells. mRNA was isolated from AtT-20FlpIn cells using standard procedures, next generation sequencing was performed by Macrogen (https://dna.macrogen.com/). A report ourtlining the workflow and data analysis methods is available from the Authors by request.
Deposited data is in an Excel file, which includes the gene symbol, transcript ID from the reference mouse genome, protein ID and transcript abundance. The AtT-20FlpIn cells were generated by Dr Santiago, and have been used as the 'wild type' cells for generating cell lines stably expressing GPCR and ion channels for most of the molecular pharmacology projects in the Molecular Pharmacodynamics group.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Viral genome reference used for detection of viral sequences in CCLE RNA-Seq and WES datasets.
Facebook
TwitterGene-level copy number variation profiles for cancer cell lines
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
The Genomics of Drug Sensitivity in Cancer (GDSC) dataset is a valuable resource for therapeutic biomarker discovery in cancer research. This dataset combines drug response data with genomic profiles of cancer cell lines, allowing researchers to investigate the relationship between genetic features and drug sensitivity.
The primary task associated with this dataset is to predict drug sensitivity (measured as IC50 values) based on genomic features of cancer cell lines. This can involve regression tasks to predict exact IC50 values or classification tasks to categorize cell lines as sensitive or resistant to specific drugs. The dataset also allows for the identification of genomic markers that correlate with drug response.
The primary target variable in this dataset is LN_IC50 (Natural log of the half-maximal inhibitory concentration). This variable represents the concentration of a drug that inhibits cell viability by 50%, measured on a logarithmic scale. Lower LN_IC50 values indicate higher drug sensitivity, making it a crucial metric for evaluating the effectiveness of anti-ca...
Facebook
TwitterThe Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.