Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cellosaurus is a knowledge resource on cell lines.
Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap. The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 563 cell lines, the RNAseq data includes 1200 cell lines, and the copy number data includes 1,626 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.
A panel of 60 human cancer cell lines used for screening anticancer drugs.
https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
Cell line pharmacogenomics datasets for cancer biology and machine learning studies. The datasets are compatible with rcellminer and CellMinerCDB (see publications for details) and data can be extracted for use with Python-based projects.
An example for extracting data from the rcellminer and CellMinerCDB compatible packages:
# INSTALL ----
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rcellminer")
# Replace path_to_file with the data package filename
install.packages(path_to_file, repos = NULL, type="source")
# GET DATA ----
## Replace nciSarcomaData with name of dataset through code
library(nciSarcomaData)
## DRUG DATA ----
drugAct <- exprs(getAct(nciSarcomaData::drugData))
drugAnnot <- getFeatureAnnot(nciSarcomaData::drugData)[["drug"]]
## MOLECULAR DATA ----
### List available datasets
names(getAllFeatureData(nciSarcomaData::molData))
### Extract data and annotations
expData <- exprs(nciSarcomaData::molData[["exp"]])
mirData <- exprs(nciSarcomaData::molData[["mir"]])
expAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["exp"]]
mirAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["mir"]]
## SAMPLE DATA ----
sampleAnnot <- getSampleData(nciSarcomaData::molData)
CellMinerCDB is a resource that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources
Comprehensive database of Short Tandem Repeat DNA profiles for all of ATCC human cell lines. ATCC data collection as part of continuing efforts to characterize and authenticate cell lines in Cell Biology collection.
Database characterizing and comparing pluripotent human stem cells. The growth and culture conditions of all 21 human embryonic stem cell lines approved under the August 2001 Presidential Executive Order have been analyzed. Available to the scientific community are the results of our rigorous characterization of these cell lines at a more advanced level.
This dataset consists of a collection of cell lines by DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen Institute). This collection currently comprises more than 800 different immortalized cell cultures of primate, rodent, amphibian, fish, insect origin isolated from numerous tissues and hybridomas.
SclcCellMinerCDB is a resource that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA half-life estimates from uniformly reprocessed/reanalyzed, published, high quality nucleotide recoding RNA-seq (NR-seq; namely SLAM-seq and TimeLapse-seq) datasets. 12 human cell lines are represented. Data can be browsed at this website.
Analysis notes:
Relevant data provided in this repository are as follows:
Datasets included:
A dataset linking genetic and molecular features of cancer cell lines to drug sensitivity.
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations. The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in human cell lines. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
hPSCreg is a global registry of human pluripotent stem cell (hPSC) lines containing manually validated information, including ethical provenance, procurement, derivation process, genetic and expression data, other biological and molecular characteristics, use, and quality of the line — Current status: 1103 hESC lines, 7333 hiPSC lines, and 182 clinical studies, and 2395 certificates
List of cell lines used in this study and summary of results for each cell line.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, biological research involving human cell lines has been rapidly developing in China. However, some of the cell lines are not authenticated before use. Therefore, misidentified and/or cross-contaminated cell lines are unfortunately commonplace. In this study, we present a comprehensive investigation of cross-contamination and misidentification for a panel of 278 cell lines from 28 institutes in China by using short tandem repeat profiling method. By comparing the DNA profiles with the cell bank databases of ATCC and DSMZ, a total of 46.0% (128/278) cases with cross-contamination/misidentification were uncovered coming from 22 institutes. Notably, 73.2% (52 out of 71) of the cell lines established by the Chinese researchers were misidentified and accounted for 40.6% of total misidentification (52/128). Further, 67.3% (35/52) of the misidentified cell lines established in laboratories of China were HeLa cells or a possible hybrid of HeLa with another kind of cell line. Furthermore, the bile duct cancer cell line HCCC-9810 and degenerative lung cancer Calu-6 exhibited 88.9% match in the ATCC database (9-loci), indicating that they were from the same origin. However, when we used 21-loci to compare these two cell lines with the same algorithm, the percent match was only 48.2%, indicating that these two cell lines were different. The SNP profiles of HCCC-9810 and Calu-6 also revealed that they were different cell lines. 150 cell lines with unique profiles demonstrated a wide range of in vitro phenotypes. This panel of 150 genomically validated cancer cell lines represents a valuable resource for the cancer research community and will advance our understanding of the disease by providing a standard reference for cell lines that can be used for biological as well as preclinical studies.
A dataset containing drug response profiles for over 600 compounds across multiple cancer cell lines.
This data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.
Datasets contain protein identification and raw data from MALDI protein identifications. Identified proteins resulted from a comparison of GLC1 and GLC1 SCLC cell lines using 2D DIGE. MALDI-TOF-MS analyses were performed on an UltraFlexTM II (Bruker Daltonics) instrument according to the instructions of the manufacturer. The instrument was equipped with a scoutTM MTP MALDI target. The spectra were acquired in the positive ion mode according to the settings given by the manufacturer. For external calibration, a peptide standard (m/z 757.399, 1296.684, 1619.822, 2093.086 and 3147.471) was used. The MALDI-PMF spectra were processed using the FlexAnalysis™ 2.4 software (Bruker Daltonics) and converted in the .xml format. For peak detection, the spectra were subjected to an internal recalibration using 13 different monoisotopic masses from autolysis products of trypsin and fragments of keratins ranging from m/z 842.509 – 2825.406. Following parameters were applied: snap peak detection algorithm, signal to noise threshold of 6, maximal number of peaks 100, quality factor threshold 50 and baseline subtraction TopHat. The generated mass lists were subsequently sent to ProteinScapeTM 1.3 (Bruker Daltonics, Bremen, Germany), triggering database searches using ProFound (Version 2002.03.01, Proteometrics LLC) and MASCOT (Version 2.3.02, Matrix Science, London, UK). The following search parameters were selected: fixed cysteine modification with propionamide, variable modification due to methionine oxidation, one maximal missed cleavage sites in case of incomplete trypsin hydrolysis and no details about 2-DE derived protein mass and pI. Using the Score booster function of ProteinScapeTM the mass lists were recalibrated and background masses removed using a list containing 44 masses occurring in a minimum of 10% of generated peak lists. The database searches were run with a mass tolerance of 40 ppm using UniProt’s human complete proteome set (downloaded on 26.10.2012) containing 68.109 protein entries. The used database is a composite database consisting of the UniProtKB entries and a duplicate of the same database, in which the amino acid sequence of each protein entry was randomly shuffled. Proteins reaching Profound score > 1.5 or Mascot score > 64 were considered as identified. Using these criteria one decoy database entry was found by the search engines indicating high confidence of protein identifications. If several database entries of homologues proteins matched these criteria only the entry with the highest score was reported.
The COLT-Cancer database is a collection of shRNA dropout signatures profiles, covering ~16000 human genes, and derived from more than 70 Pancreatic, Ovarian and Breast human cancer cell-lines using the microarray detection platform developed in the COLT (CCBR-OICR Lentiviral Technology) facility at the Moffat Lab. All shRNA dropout profiles are freely available through download or queries via this website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cellosaurus is a knowledge resource on cell lines.