The Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations. The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in human cell lines. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Human cell lines are perhaps the go-to tool for initiating studies in many biological and biomedical research fields. While there are many papers to guide neophytes researchers along the journey, many aspects of cell culture and maintenance remain experiential and require months of training in order to obtain reproducible results. What is perhaps lacking is some foundational understanding of the cell lines used in research. This dataset is a cursory effort in this direction that attempts to summarize the key features of common human cell lines in biological and biomedical research. Features documented include organ of origin, disease model, cell type, and purpose of use. It is hoped that this small dataset will help provide the architecture for researchers interested in biomedical problems to also use a similar concept in building their own database on which more discoveries can be made.
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The Cell Atlas provides high-resolution insights into the expression and spatio-temporal distribution of RNA and proteins in human cell lines. Genome-wide mRNA expression is determined by deep RNA-sequencing in a panel of 69 cell lines, representing various cell populations in different organs and tissues of the human body. The subcellular distributions of proteins encoded by 12813 genes (65% of the human protein-coding genes) are investigated in a subset of cell lines selected based on corresponding RNA expression. Protein localization data is derived from antibody-based profiling, using immunofluorescence (ICC-IF) and confocal microscopy, and classified into 35 different organelles and fine subcellular structures. The Cell Atlas offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of transcriptomes and proteomes in broader contexts, in order to increase the understanding of human biology at the cellular and subcellular levels.
Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Human cancer cell lines grown in vitro are frequently used to decipher basic cell biological phenomena and to also specifically study different forms of cancer. Here we present the first large-scale study of protein expression patterns in cell lines using an antibody-based proteomics approach. We analyzed the expression pattern of 5436 proteins in 45 different cell lines using hierarchical clustering, principal component analysis, and two-group comparisons for the identification of differentially expressed proteins. Our results show that immunohistochemically determined protein profiles can categorize cell lines into groups that overall reflect the tumor tissue of origin and that hematological cell lines appear to retain their protein profiles to a higher degree than cell lines established from solid tumors. The two-group comparisons reveal well-characterized proteins as well as previously unstudied proteins that could be of potential interest for further investigations. Moreover, multiple myeloma cells and cells of myeloid origin were found to share a protein profile, relative to the protein profile of lymphoid leukemia and lymphoma cells, possibly reflecting their common dependency of bone marrow microenvironment. This work also provides an extensive list of antibodies, for which high-resolution images as well as validation data are available on the Human Protein Atlas (www.proteinatlas.org), that are of potential use in cell line studies.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SUM human breast cancer cell lines have been used by many labs around the world to develop extensive data sets derived from comparative genomic hybridization analysis, gene expression profiling, whole exome sequencing, and reverse phase protein array analysis. In a previous study, the authors of this paper performed genome-scale shRNA essentiality screens on the entire SUM line panel, as well as on MCF10A cells, MCF-7 cells, and MCF-7LTED cells. In this study, the authors have developed the SUM Breast Cancer Cell Line Knowledge Base, to make all of these omics data sets available to users of the SUM lines, and to allow users to mine the data and analyse them with respect to biological pathways enriched by the data in each cell line.Data access: All the datasets supporting the findings of this study are publicly available in the SLKBase platform here: https://sumlineknowledgebase.com/. RPPA data, drug sensitivity data, apelisib response data, and data on dose response, are also part of this figshare data record (https://doi.org/10.6084/m9.figshare.12497630).Study aims and methodology: This web-based knowledge base provides users with data and information on the derivation of each of the cell lines, provides narrative summaries of the genomics and cell biology of each breast cancer cell line, and provides protocols for the proper maintenance of the cells. The database includes a series of data mining tools that allow rapid identification of the functional oncogene signatures for each line, the enrichment of any KEGG pathway with screen hit and gene expression data for each of the lines, and a rapid analysis of protein and phospho-protein expression for the cell lines. A gene search tool that returns all of the functional genome and functional druggable data for any gene for the entire cell line panel, is included. Additionally, the authors have expanded the database to include functional genomic data for an additional 29 commonly used breast cancer cell lines. The three overarching goals in the original development of the SLKBase are: 1) to provide a rich source of information for anyone working with any of the SUM breast cancer cell lines, 2) to give researchers ready access to the large genomic data sets that have been developed with these cells, and 3) to allow researchers to perform orthogonal analyses of the various genomics data sets that we and others have obtained from the SUM lines. For more information on the development and contents of the database, please read the related article.Datasets supporting the paper:The data mining tools accessed the following datasets to generate the figures and tables, and these datasets are downloadable from the Data Download centre on the SLKBase: Exome sequencing data: SLKBase.exome_.seq_.sum_.xlsxGene amplification and expression data for the SUM cell lines: SUM44amplificationdata.xlsSUM52.xlsSUM149.xlsSUM159.xlsSUM185.xlsSUM190.xlsSUM225.xlsSUM229.xlsSUM1315.xlsCellecta shRNA screen data for the SUM cell lines:SUM44Celectadata.csvSUM52Cellectadata.csvSUM102Cellectadata.csvSUM149Cellectadata.csvSUM159Cellectadata.csvSUM185Cellectadata.csvSUM190Cellectadata.csvSUM225Cellectadata.csvSUM229Cellectadata.csvSUM1315hits.hit.csvMCF10A.hits_.csvBreast cancer cell line data included in this data record (these datasets were used to generate figures 1, 2 and 7 in the article):Proteomics data from the Reverse Phase Protein Array (RPPA) assay analysis: Ethier.SUMline.RPPA.xlsxDrug sensitivity data: NAVITOCLAX.drugsensitivity.Zscores.xlsxApelisib response data: Apelisib all lines (2).xlsxDose response data: 092614 Dose Response CP 52s.11.15.xlsxAll the files are either in .xlsx or .csv file format.
https://ega-archive.org/dacs/EGAC00001000649https://ega-archive.org/dacs/EGAC00001000649
Whole genome sequencing of commercial LoVo, GP5D, COLO320DM, CaCo-2 and RPE1 cell lines and three RPE1-TP53 knock-out cell lines separated by 6 months of culture from their most recent common ancestor.
This data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.
The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least ∼10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution.
A vast assortment of human cell lines is available for cell culture model-based studies, and as such the potential exists for discrepancies in findings due to cell line selection. To investigate this concept, we determined the relative protein abundance profiles of a panel of eight diverse, but commonly studied, human cell lines. This panel includes: HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH-SY5Y, and SVGp12. We use a mass spectrometry-based proteomics workflow designed to enhance quantitative accuracy while maintaining analytical depth. To this end, our strategy leverages TMTpro16-based sample multiplexing, high-Field Asymmetric Ion Mobility Spectrometry (FAIMS), and real-time database searching (RTS). The data show that cell line diversity was reflective of differences in the relative protein abundance profiles. We also determined that several hundred proteins were highly enriched for a given cell line and performed gene ontology and pathway analysis on these cell line-enriched proteins. We provide an R Shiny application to query protein abundance profiles and retrieve proteins with similar patterns. The workflows used herein can be applied to additional cell lines to aid cell line selection in addressing a given scientific inquiry or in improving an experimental design.
A virtual database currently indexing available cell lines from: Coriell Cell Repositories, International Mouse Strain Resource (IMSR), ATCC, NIH Human Pluripotent Stem Cell Registry, NIGMS Human Genetic Cell Repository, and Developmental Therapeutics Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The predicted phenotypes are listed for all NCI-60 cell lines in the data set. None of the skin cancer cell lines in the data set was a NCI-60 cell line. CNS = Central nervous system.
hPSCreg is a global registry of human pluripotent stem cell (hPSC) lines containing manually validated information, including ethical provenance, procurement, derivation process, genetic and expression data, other biological and molecular characteristics, use, and quality of the line — Current status: 1092 hESC lines, 7212 hiPSC lines, and 182 clinical studies, and 2394 certificates
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
A collaborative project between the Broad Institute and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation, with the goal of conducting a detailed genetic and pharmacologic characterization of a large panel of human cancer models. The CCLE also works to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.
Although genetic and epigenetic abnormalities in breast cancer have been extensively studied, it remains difficult to identify those patients who will respond to particular therapies. This is due in part to our lack of understanding of how the variability of cellular signaling affects drug sensitivity. Here, we used mass cytometry to characterize the single-cell signaling landscapes of 62 breast cancer cell lines and five lines from healthy tissue. We quantified 34 markers in each cell line upon stimulation by the growth factor EGF in the presence or absence of five kinase inhibitors. These data – on more than 80 million single cells from 4,000 conditions – were used to fit mechanistic signaling network models that provide unprecedented insights into the biological principles of how cancer cells process information. Our dynamic single-cell-based models more accurately predicted drug sensitivity than static bulk measurements for drugs targeting the PI3K-MTOR signaling pathway. Finally, we identified genomic features associated with drug sensitivity by using signaling phenotypes as proxies, including a missense mutation in DDIT3 predictive of PI3K-inhibition sensitivity. This provides proof of principle that single-cell measurements and modeling could inform matching of patients with appropriate treatments in the future.
This dataset covers the RNA (Ribonucleic acid) levels in 56 cell lines and 37 tissues based on RNA-sequence. The data is based on The Human Protein Atlas version 21.0 and Ensembl version 103.38.
Comprehensive database of Short Tandem Repeat DNA profiles for all of ATCC human cell lines. ATCC data collection as part of continuing efforts to characterize and authenticate cell lines in Cell Biology collection.
Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
The Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.