Facebook
TwitterDatabase of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SUM human breast cancer cell lines have been used by many labs around the world to develop extensive data sets derived from comparative genomic hybridization analysis, gene expression profiling, whole exome sequencing, and reverse phase protein array analysis. In a previous study, the authors of this paper performed genome-scale shRNA essentiality screens on the entire SUM line panel, as well as on MCF10A cells, MCF-7 cells, and MCF-7LTED cells. In this study, the authors have developed the SUM Breast Cancer Cell Line Knowledge Base, to make all of these omics data sets available to users of the SUM lines, and to allow users to mine the data and analyse them with respect to biological pathways enriched by the data in each cell line.Data access: All the datasets supporting the findings of this study are publicly available in the SLKBase platform here: https://sumlineknowledgebase.com/. RPPA data, drug sensitivity data, apelisib response data, and data on dose response, are also part of this figshare data record (https://doi.org/10.6084/m9.figshare.12497630).Study aims and methodology: This web-based knowledge base provides users with data and information on the derivation of each of the cell lines, provides narrative summaries of the genomics and cell biology of each breast cancer cell line, and provides protocols for the proper maintenance of the cells. The database includes a series of data mining tools that allow rapid identification of the functional oncogene signatures for each line, the enrichment of any KEGG pathway with screen hit and gene expression data for each of the lines, and a rapid analysis of protein and phospho-protein expression for the cell lines. A gene search tool that returns all of the functional genome and functional druggable data for any gene for the entire cell line panel, is included. Additionally, the authors have expanded the database to include functional genomic data for an additional 29 commonly used breast cancer cell lines. The three overarching goals in the original development of the SLKBase are: 1) to provide a rich source of information for anyone working with any of the SUM breast cancer cell lines, 2) to give researchers ready access to the large genomic data sets that have been developed with these cells, and 3) to allow researchers to perform orthogonal analyses of the various genomics data sets that we and others have obtained from the SUM lines. For more information on the development and contents of the database, please read the related article.Datasets supporting the paper:The data mining tools accessed the following datasets to generate the figures and tables, and these datasets are downloadable from the Data Download centre on the SLKBase: Exome sequencing data: SLKBase.exome_.seq_.sum_.xlsxGene amplification and expression data for the SUM cell lines: SUM44amplificationdata.xlsSUM52.xlsSUM149.xlsSUM159.xlsSUM185.xlsSUM190.xlsSUM225.xlsSUM229.xlsSUM1315.xlsCellecta shRNA screen data for the SUM cell lines:SUM44Celectadata.csvSUM52Cellectadata.csvSUM102Cellectadata.csvSUM149Cellectadata.csvSUM159Cellectadata.csvSUM185Cellectadata.csvSUM190Cellectadata.csvSUM225Cellectadata.csvSUM229Cellectadata.csvSUM1315hits.hit.csvMCF10A.hits_.csvBreast cancer cell line data included in this data record (these datasets were used to generate figures 1, 2 and 7 in the article):Proteomics data from the Reverse Phase Protein Array (RPPA) assay analysis: Ethier.SUMline.RPPA.xlsxDrug sensitivity data: NAVITOCLAX.drugsensitivity.Zscores.xlsxApelisib response data: Apelisib all lines (2).xlsxDose response data: 092614 Dose Response CP 52s.11.15.xlsxAll the files are either in .xlsx or .csv file format.
Facebook
TwitterThe Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.
Facebook
TwittermRNA microarray expression profiles for cancer cell lines
Facebook
TwitterA panel of 60 human cancer cell lines used for screening anticancer drugs.
Facebook
TwitterSummary from the GEO: "RNA-sequencing of a panel of urothelial cancer cells. The goal of the study is to examine the genome-wide expression profile in each of the 30 urothelial cancer cells tested in our laboratory."
"Overall design: Each of the 30 cell lines was DNA fingerprinted to confirm its real identity. Total RNA was obtained from each cell line and subjected to Illumina RNA sequencing."
The data was from a study on comprehensive molecular characterization of muscle-invasive bladder cancer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, biological research involving human cell lines has been rapidly developing in China. However, some of the cell lines are not authenticated before use. Therefore, misidentified and/or cross-contaminated cell lines are unfortunately commonplace. In this study, we present a comprehensive investigation of cross-contamination and misidentification for a panel of 278 cell lines from 28 institutes in China by using short tandem repeat profiling method. By comparing the DNA profiles with the cell bank databases of ATCC and DSMZ, a total of 46.0% (128/278) cases with cross-contamination/misidentification were uncovered coming from 22 institutes. Notably, 73.2% (52 out of 71) of the cell lines established by the Chinese researchers were misidentified and accounted for 40.6% of total misidentification (52/128). Further, 67.3% (35/52) of the misidentified cell lines established in laboratories of China were HeLa cells or a possible hybrid of HeLa with another kind of cell line. Furthermore, the bile duct cancer cell line HCCC-9810 and degenerative lung cancer Calu-6 exhibited 88.9% match in the ATCC database (9-loci), indicating that they were from the same origin. However, when we used 21-loci to compare these two cell lines with the same algorithm, the percent match was only 48.2%, indicating that these two cell lines were different. The SNP profiles of HCCC-9810 and Calu-6 also revealed that they were different cell lines. 150 cell lines with unique profiles demonstrated a wide range of in vitro phenotypes. This panel of 150 genomically validated cancer cell lines represents a valuable resource for the cancer research community and will advance our understanding of the disease by providing a standard reference for cell lines that can be used for biological as well as preclinical studies.
Facebook
TwitterComprehensive database of Short Tandem Repeat DNA profiles for all of ATCC human cell lines. ATCC data collection as part of continuing efforts to characterize and authenticate cell lines in Cell Biology collection.
Facebook
TwitterAn independent committee established to improve visibility of cell lines and promote awareness and authentication testing to combat false or misidentified cell lines. It contains a databases of cross-contaminated or otherwise misidentified cell lines, as well as resources to familiarize users of cell lines and the problem of misidentification. Their Terms of Reference defines false or misidentified cell lines and other commonly used terms, as well as sets out the committee goals and ground rules.
Facebook
TwitterhPSCreg is a global registry of human pluripotent stem cell (hPSC) lines containing manually validated information, including ethical provenance, procurement, derivation process, genetic and expression data, other biological and molecular characteristics, use, and quality of the line — Current status: 1123 hESC lines, 7670 hiPSC lines, and 205 clinical studies, and 2402 certificates
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
The Genomics of Drug Sensitivity in Cancer (GDSC) dataset is a valuable resource for therapeutic biomarker discovery in cancer research. This dataset combines drug response data with genomic profiles of cancer cell lines, allowing researchers to investigate the relationship between genetic features and drug sensitivity.
The primary task associated with this dataset is to predict drug sensitivity (measured as IC50 values) based on genomic features of cancer cell lines. This can involve regression tasks to predict exact IC50 values or classification tasks to categorize cell lines as sensitive or resistant to specific drugs. The dataset also allows for the identification of genomic markers that correlate with drug response.
The primary target variable in this dataset is LN_IC50 (Natural log of the half-maximal inhibitory concentration). This variable represents the concentration of a drug that inhibits cell viability by 50%, measured on a logarithmic scale. Lower LN_IC50 values indicate higher drug sensitivity, making it a crucial metric for evaluating the effectiveness of anti-ca...
Facebook
TwitterThis dataset contains summary data visualizations and clinical data from 67 samples from 67 patients as part an NCI-60 cell line project to compile NCI-60 cell line high-throughput and high-content data into CellMiner, a genomic and pharmacologic database created by the National Cancer Center Institute. The clinical data includes deidentified patient and sample IDs, mutation counts, detailed cancer type information, patient demographics, and past modality. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reverse Transcription - quantitative Polymerase Chain Reaction (RT-qPCR) is a standard technique in most laboratories. The selection of reference genes is essential for data normalization and the selection of suitable reference genes remains critical. Our aim was to 1) review the literature since implementation of the MIQE guidelines in order to identify the degree of acceptance; 2) compare various algorithms in their expression stability; 3) identify a set of suitable and most reliable reference genes for a variety of human cancer cell lines. A PubMed database review was performed and publications since 2009 were selected. Twelve putative reference genes were profiled in normal and various cancer cell lines (n = 25) using 2-step RT-qPCR. Investigated reference genes were ranked according to their expression stability by five algorithms (geNorm, Normfinder, BestKeeper, comparative ΔCt, and RefFinder). Our review revealed 37 publications, with two thirds patient samples and one third cell lines. qPCR efficiency was given in 68.4% of all publications, but only 28.9% of all studies provided RNA/cDNA amount and standard curves. GeNorm and Normfinder algorithms were used in 60.5% in combination. In our selection of 25 cancer cell lines, we identified HSPCB, RRN18S, and RPS13 as the most stable expressed reference genes. In the subset of ovarian cancer cell lines, the reference genes were PPIA, RPS13 and SDHA, clearly demonstrating the necessity to select genes depending on the research focus. Moreover, a cohort of at least three suitable reference genes needs to be established in advance to the experiments, according to the guidelines. For establishing a set of reference genes for gene normalization we recommend the use of ideally three reference genes selected by at least three stability algorithms. The unfortunate lack of compliance to the MIQE guidelines reflects that these need to be further established in the research community.
Facebook
TwitterA virtual database currently indexing available cell lines from: Coriell Cell Repositories, International Mouse Strain Resource (IMSR), ATCC, NIH Human Pluripotent Stem Cell Registry, NIGMS Human Genetic Cell Repository, and Developmental Therapeutics Program.
Facebook
TwitterA collaborative project between the Broad Institute and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation, with the goal of conducting a detailed genetic and pharmacologic characterization of a large panel of human cancer models. The CCLE also works to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.
Facebook
Twitterhttps://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
If you use this data, please cite: Luna A, Elloumi F, Varma S et al. NAR. 2021. PMID: 33196823
Cell line pharmacogenomics datasets for cancer biology and machine learning studies. The datasets are compatible with rcellminer and CellMinerCDB (see publications for details) and data can be extracted for use with Python-based projects.
An example for extracting data from the rcellminer and CellMinerCDB compatible packages:
# INSTALL ----
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rcellminer")
# Replace path_to_file with the data package filename
install.packages(path_to_file, repos = NULL, type="source")
# GET DATA ----
## Replace nciSarcomaData with name of dataset through code
library(nciSarcomaData)
## DRUG DATA ----
drugAct <- exprs(getAct(nciSarcomaData::drugData))
drugAnnot <- getFeatureAnnot(nciSarcomaData::drugData)[["drug"]]
## MOLECULAR DATA ----
### List available datasets
names(getAllFeatureData(nciSarcomaData::molData))
### Extract data and annotations
expData <- exprs(nciSarcomaData::molData[["exp"]])
mirData <- exprs(nciSarcomaData::molData[["mir"]])
expAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["exp"]]
mirAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["mir"]]
## SAMPLE DATA ----
sampleAnnot <- getSampleData(nciSarcomaData::molData)
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The Cell Atlas provides high-resolution insights into the expression and spatio-temporal distribution of proteins within human cells. Using a panel of 64 cell lines to represent various cell populations in different organs and tissues of the human body, the mRNA expression of all human genes are characterized by deep RNA-sequencing. The subcellular distribution of each protein is investigated in a subset of cell lines selected based on corresponding gene expression. The protein localization data is derived from antibody-based profiling by immunofluorescence confocal microscopy, and classified into 32 different organelles and fine subcellular structures. The Cell Atlas currently covers 12390 genes (63%) for which there are available antibodies. It offers a database for exploring details of individual genes and proteins of interest, as well as systematically analyzing transcriptomes and proteomes in broader contexts, in order to increase our understanding of human cells.
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Subcellular methods
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13603 genes (67% of the human protein-coding genes), as well as predictions for an additional 3459 secreted- or membrane proteins, covering a total of 17062 genes (85% of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 42 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines, induced pluripotent stem cells (iPSCs) and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.
The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in standard human cell lines, including ciliated cells and iPSCs. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full tables of reanalyzed gene expression data for primary neutrophils and HL-60 cells from previously published studies. This Excel file contains four sheets. The first sheet contains FPKM gene expression values generated by Cufflinks for all primary human neutrophil and HL-60 samples reanalyzed in this study. The second sheet contains the corresponding log10-transformed normalized expression values. The third sheet contains FPKM gene expression values for all primary mouse neutrophil samples reanalyzed in this study, and the fourth sheet contains the corresponding log10-transformed normalized values. (XLSX 21707 kb)
Facebook
TwitterIt is a comprehensive database of Gene Expression Profiles, which enable to compare the transcriptome of various tissues, organs and experiments. mRNA expression levels of thousands of genes are measured with oligo-nucleotide DNA microarray "GeneChip". All gene expression data in this database is produced by LSBM (Laboratory for Systems Biology and Medicine) and the collaborators. SBM DB provides two different databases: A reference database for fur expression analysis (RefEXA) and LSMB GeNet, a database of various organisms, tissues, and experiences. RefEXA provides a comprehensive gene expression database of Human normal tissues, normal cultured cells and cancer cell lines with GeneChip HG-U133A, can help investigation of Human disease. LSMB provides
Facebook
TwitterDatabase of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.