100+ datasets found
  1. c

    Cellosaurus

    • cellosaurus.org
    obo, text/markdown +1
    Updated Nov 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bairoch Amos (2018). Cellosaurus [Dataset]. http://identifiers.org/MIR:00000598
    Explore at:
    obo, xml, text/markdownAvailable download formats
    Dataset updated
    Nov 23, 2018
    Dataset provided by
    SIB Swiss Institute of Bioinformatics
    Authors
    Bairoch Amos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1943 - Present
    Description

    Cellosaurus is a knowledge resource on cell lines.

  2. r

    Cellosaurus

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Cellosaurus [Dataset]. http://identifiers.org/RRID:SCR_013869
    Explore at:
    Dataset updated
    Sep 15, 2025
    Description

    Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.

  3. DepMap 19Q2 Public

    • figshare.com
    txt
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2023). DepMap 19Q2 Public [Dataset]. http://doi.org/10.6084/m9.figshare.8061398.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the results of Avana library CRISPR-Cas9 genome-scale knockout (prefixed with Achilles) as well as mutation, copy number and gene expression data (prefixed with CCLE) for cancer cell lines as part of the Broad Institute’s Cancer Dependency Map project. We have repackaged our fileset to include all quarterly-updating datasets produced by DepMap. The Avana CRISPR-Cas9 genome-scale knockout data has expanded to include 563 cell lines, the RNAseq data includes 1200 cell lines, and the copy number data includes 1,626 cell lines. Please see the README files for details regarding data processing pipeline procedures updates.As our screening efforts continue, we will be releasing additional cancer dependency data on a quarterly basis for unrestricted use. For the latest datasets available, further analyses, and to subscribe to our mailing list visit https://depmap.org.Descriptions of the experimental methods and the CERES algorithm are published in http://dx.doi.org/10.1038/ng.3984. Some cell lines were process using copy number data based on the Sanger Institute whole exome sequencing data (COSMIC: http://cancer.sanger.ac.uk.cell_lines, EGA accession number: EGAD00001001039) reprocessed using CCLE pipelines. A detailed description of the pipelines and tool versions for CCLE expression can be found here: https://github.com/broadinstitute/gtex-pipeline/blob/v9/TOPMed_RNAseq_pipeline.md.

  4. NCI-60 Cancer Cell Lines

    • bigomics.ch
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Cancer Institute (NCI) (2024). NCI-60 Cancer Cell Lines [Dataset]. https://bigomics.ch/blog/top-databases-for-drug-discovery/
    Explore at:
    Dataset updated
    Nov 8, 2024
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Authors
    National Cancer Institute (NCI)
    Description

    A panel of 60 human cancer cell lines used for screening anticancer drugs.

  5. Pharmacogenomics Datasets for Cancer Cell Lines from CellMiner...

    • zenodo.org
    application/gzip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Augustin Luna; Augustin Luna; Fathi Elloumi; Fathi Elloumi; Vinodh Rajapakse; Vinodh Rajapakse (2025). Pharmacogenomics Datasets for Cancer Cell Lines from CellMiner Cross-Database (CellMinerCDB) [Dataset]. http://doi.org/10.5281/zenodo.15122311
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Augustin Luna; Augustin Luna; Fathi Elloumi; Fathi Elloumi; Vinodh Rajapakse; Vinodh Rajapakse
    License

    https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html

    Description

    Cell line pharmacogenomics datasets for cancer biology and machine learning studies. The datasets are compatible with rcellminer and CellMinerCDB (see publications for details) and data can be extracted for use with Python-based projects.

    An example for extracting data from the rcellminer and CellMinerCDB compatible packages:

    # INSTALL ----
    if (!require("BiocManager", quietly = TRUE))
      install.packages("BiocManager")
    
    BiocManager::install("rcellminer")
    
    # Replace path_to_file with the data package filename
    install.packages(path_to_file, repos = NULL, type="source")
    
    # GET DATA ----
    ## Replace nciSarcomaData with name of dataset through code 
    library(nciSarcomaData)
    
    ## DRUG DATA ----
    drugAct <- exprs(getAct(nciSarcomaData::drugData))
    drugAnnot <- getFeatureAnnot(nciSarcomaData::drugData)[["drug"]]
    
    ## MOLECULAR DATA ----
    ### List available datasets
    names(getAllFeatureData(nciSarcomaData::molData))
    
    ### Extract data and annotations
    expData <- exprs(nciSarcomaData::molData[["exp"]])
    mirData <- exprs(nciSarcomaData::molData[["mir"]])
    
    expAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["exp"]]
    mirAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["mir"]]
    
    ## SAMPLE DATA ----
    sampleAnnot <- getSampleData(nciSarcomaData::molData)

  6. NCI60 and other cancer cell line datasets

    • discover.nci.nih.gov
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GPF/DTB/CCR/NCI/NIH (2025). NCI60 and other cancer cell line datasets [Dataset]. https://discover.nci.nih.gov/cellminercdb/
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Authors
    GPF/DTB/CCR/NCI/NIH
    Description

    CellMinerCDB is a resource that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources

  7. n

    ATCC STR database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Apr 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ATCC STR database [Dataset]. http://identifiers.org/RRID:SCR_019203
    Explore at:
    Dataset updated
    Apr 28, 2021
    Description

    Comprehensive database of Short Tandem Repeat DNA profiles for all of ATCC human cell lines. ATCC data collection as part of continuing efforts to characterize and authenticate cell lines in Cell Biology collection.

  8. r

    StemCellDB

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). StemCellDB [Dataset]. http://identifiers.org/RRID:SCR_006305
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database characterizing and comparing pluripotent human stem cells. The growth and culture conditions of all 21 human embryonic stem cell lines approved under the August 2001 Presidential Executive Order have been analyzed. Available to the scientific community are the results of our rigorous characterization of these cell lines at a more advanced level.

  9. Human and Animal Cell Lines

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Human and Animal Cell Lines [Dataset]. https://www.johnsnowlabs.com/marketplace/human-and-animal-cell-lines/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset consists of a collection of cell lines by DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen Institute). This collection currently comprises more than 800 different immortalized cell cultures of primate, rodent, amphibian, fish, insect origin isolated from numerous tissues and hybridomas.

  10. SCLC NCI and other SCLC cancer cell line datasets

    • discover.nci.nih.gov
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GPF/DTB/CCR/NCI/NIH (2022). SCLC NCI and other SCLC cancer cell line datasets [Dataset]. https://discover.nci.nih.gov/SclcCellMinerCDB/
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    National Cancer Institutehttp://www.cancer.gov/
    Authors
    GPF/DTB/CCR/NCI/NIH
    Description

    SclcCellMinerCDB is a resource that simplifies access and exploration of cancer cell line pharmacogenomic data across different sources

  11. RNAdecayCafe: a uniformly reprocessed atlas of human RNA half-lives across...

    • zenodo.org
    application/gzip, bin +1
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Vock; Isaac Vock (2025). RNAdecayCafe: a uniformly reprocessed atlas of human RNA half-lives across 12 cell lines [Dataset]. http://doi.org/10.5281/zenodo.15785218
    Explore at:
    csv, bin, application/gzipAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Isaac Vock; Isaac Vock
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA half-life estimates from uniformly reprocessed/reanalyzed, published, high quality nucleotide recoding RNA-seq (NR-seq; namely SLAM-seq and TimeLapse-seq) datasets. 12 human cell lines are represented. Data can be browsed at this website.

    Analysis notes:

    • Data was processed using fastq2EZbakR. All config files used are provided in fastq2EZbakR_config.tar.gz (as well as some for data not included in the final RNAdecayCafe due to QC issues). Some general notes:
      • Multi-mapping reads were filtered out completely. It is difficult to do anything accurate/intelligent with such reads (see discussion here for instance), so better to just get rid of these completely. Does mean that some classes of features rich in repetitive sequences will be underrepresented in this database.
      • Adapters were trimmed. For 3'-end data, 12 additional nucleotides were trimmed from the 5' end of the reads, as suggested by the developers of the popular Quant-seq kit, and as is done by default in SLAMDUNK. Quality score end trimming and polyX trimming is also done for all samples.
    • Data was analyzed using EZbakR. Some general notes:
      • If -s4U data was included, this data was used to infer a global pold to stabilize pnew estimation (see Methods here for brief discussion). Dropout was also corrected using a previously devleoped strategy now implemented in EZbakR's CorrectDropout() function.
      • Half-lives are estimated on a gene-level. That is, all reads that map to exonic regions of a gene (i.e., regions that are exonic in at least one annotated isoform) are combined and used to estimate a half-life for that gene. Thus, you should think of these half-life estimates as a weighted average over all isoforms expressed from that gene, weighted by the relative abundances of those isoforms. Future releases may include isoform-resolution estimates as well, given that EZbakR can now perform this type of analysis.
    • A unique feature of RNAdecayCafe is that it includes what I am referring to as "dropout normalized" half-life and kdeg estimates. As not all datasets analyzed include -s4U data, dropout correction is not possible for all samples. This can lead to global biases in the average time scale of half-lives that is unlikely to represent real biology (that is, two different K562 datasets may have median half-life estimates of 4 hours and 15 hours). To address this problem and faciltiate comparison across datasets, I developed a strategy (implemented in EZbakR's NormalizeForDropout() function) that uses a model of dropout to normalize estimates with respect to a low dropout sample.
      • These "donorm" estimates will often be a more accurate reflection of rate constants and half-lives in a given cell line.
      • This strategy can normalize out real global differences in turnover kinetics though, so interpret these values with care.

    Relevant data provided in this repository are as follows:

    1. hg38_Strict.gtf: annotation used for analysis. Filtered similarly to how is described here.
    2. AvgKdegs_genes_v1.csv: Table of cell-line average half-lives and degradation rate constants (kdegs). Average log(kdeg)'s are calculated for all samples from a given cell line, weighting by the uncertainty in the log(kdeg) estimate. Columns in this table are as follows:
      1. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
      2. cell_line: Human cell line for which averages are calculated
      3. avg_log_kdeg: Weighted log(kdeg) average
      4. avg_donorm_log_kdeg: Weighted dropout normalized log(kdeg) average.
      5. avg_log_RPKM_total: Average log(RPKM) value from total RNA data. A value of exactly 0 means that there was no total RNA data for this cell line (i.e., all data was 3'-end data).
      6. avg_log_RPKM_3pend: Average log(RPKM) value from 3'-end data. Technically no length normalization is performed as this is 3'-end data, so it is really an log(RPM). A value of exactly 0 means that there was no 3'-end data for this cell line (i.e., all data was for total RNA).
      7. avg_kdeg: e^avg_log_kdeg
      8. avg_donorm_kdeg: e^avg_donorm_log_kdeg
      9. avg_halflife: log(2)/avg_kdeg; can be thought of as average lifetime of the RNA.
      10. avg_donorm_halflife: log(2)/avg_donorm_kdeg
      11. avg_RPKM_total: e^avg_log_RPKM_total
      12. avg_RPKM_3pend: e^avg_log_RPKM_3pend.
    3. FeatureDetails_gene_v1.csv: Table of details about each gene measured; information comes from hg38_Strict.gtf and the corresponding hg38 genome FASTA file.
      1. seqnames: chromosome name
      2. strand: strand on which gene is transcribed
      3. start: genomic start position for gene (most 5'-end coordinate; will be location of TES for - strand genes).
      4. end: end position for gene
      5. type: all "gene" for now, as all analyses are currently gene-level average half-life calculations
      6. exon_length: length of union of exons for a given gene. A read is considered exonic, and thus used for half-life estimation, if it exclusively overlaps with the region defined by the union of all annotated exons for that gene.
      7. exon_GC_fraction: fraction of nucleotides in union of exons that are Gs or Cs.
      8. end_GC_fraction: fraction of nucleotides in last 1000 nts of 3'end of transcript that are Gs or Cs. Useful for assessing GC biases in 3'-end data.
      9. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
    4. SampleDetails_v1.csv: Table of details about all samples represented in RNAdecayCafe
      1. sample: SRA accession ID for sample
      2. dataset: Citation-esque summary of the dataset of origin
      3. pnew: EZbakR estimated T-to-C mutation rate in reads from new (labeled) RNA. You can see these blogs (here and here) for some intuition as to how to interpret these. More technical explanations of the models involved can be found here and here.
      4. pold: EZbakR estimated T-to-C mutation rate in reads from old (unlabeled) RNA. Same citations for pnew apply. Best samples are those with the largest gap between the pold and pnew; can think of this like a signal-to-noise ratio
      5. label_time: How long (in hours) were cells labeled with s4U for?
      6. cell_line: Cell line used for that sample.
      7. threePseq: TRUE or FALSE; TRUE if 3'-end sequencing was used.
      8. total_reads: Total number of aligned, exonic reads in the sample.
      9. median_halflife: Median, unnormalized half-life estimate. Differences between cell lines could represent real biology, but could also be evidence of dropout (see here and here for discussion of this phenomenon).
    5. RateConstants_gene_v1.csv
      1. sample: SRA accession ID for sample.
      2. kdeg: e ^ log_kdeg
      3. halflife: log(2) / kdeg. Can be thought of as the average lifetime of the RNA.
      4. donorm_kdeg: e ^ donorm_log_kdeg
      5. donorm_halflife: log(2) / donorm_kdeg
      6. log_kdeg: log degradation rate constant estimated by EZbakR
      7. donorm_log_kdeg: dropout normalized log degradation rate constant
      8. reads: number of reads that contributed to estimates
      9. donorm_reads: dropout normalization corrected read count
      10. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
    6. RNAdecayCafe_database_v1.rds: compressed RDS file that stores a list containing the above 4 tables in the following entries:
      1. kdegs = RateConstants_gene_v1.csv
      2. sample_metadata = SampleDetails_v1.csv
      3. feature_metadata = FeatureDetails_v1.csv
      4. average_kdegs = AvgKdegs_gene_v1.csv
    7. RNAdecayCafe_v1_onetable.csv: Inner joining of all but the averages table in this database. Thus, is one mega table containing all sample-specific estimates, sample metadata, and feature information.

    Datasets included:

    1. Finkel et al. 2021 (Calu3 cells; PMID: 35313595)
    2. Harada et al. 2022 (MV411 cells; PMID: 35301220)
    3. Ietswaart et al. 2024 (K562 cells; PMID: 38964322); whole-cell data used
    4. Luo et al. 2020 (HEK293T cells; PMID: 33357462)
    5. Mabin et al. 2025 (HEK293T cells; PMID: 40161772); only dataset for which data is not yet publicly

  12. b

    Cancer Therapeutics Response Portal (CTRP v2)

    • bigomics.ch
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad Institute of MIT and Harvard (2024). Cancer Therapeutics Response Portal (CTRP v2) [Dataset]. https://bigomics.ch/blog/top-databases-for-drug-discovery/
    Explore at:
    Dataset updated
    Nov 8, 2024
    Dataset authored and provided by
    Broad Institute of MIT and Harvard
    Description

    A dataset linking genetic and molecular features of cancer cell lines to drug sensitivity.

  13. p

    Human Protein Atlas - Subcellular

    • proteinatlas.org
    Updated Nov 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Protein Atlas (2021). Human Protein Atlas - Subcellular [Dataset]. https://www.proteinatlas.org/humanproteome/subcellular
    Explore at:
    Dataset updated
    Nov 19, 2021
    Dataset authored and provided by
    Human Protein Atlas
    License

    https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence

    Description

    The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations. The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:

    The subcellular distribution of proteins in human cell lines. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.

  14. h

    hPSCreg dataset, continuously updated

    • hpscreg.eu
    Updated Aug 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). hPSCreg dataset, continuously updated [Dataset]. https://hpscreg.eu/
    Explore at:
    Dataset updated
    Aug 1, 2016
    Variables measured
    usage, ethics, derivation, genotyping, characterisation, donor information, culture conditions, general information, genetic modification
    Description

    hPSCreg is a global registry of human pluripotent stem cell (hPSC) lines containing manually validated information, including ethical provenance, procurement, derivation process, genetic and expression data, other biological and molecular characteristics, use, and quality of the line — Current status: 1103 hESC lines, 7333 hiPSC lines, and 182 clinical studies, and 2395 certificates

  15. f

    List of cell lines used in this study and summary of results for each cell...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tachedjian, Mary; Smith, Ina; Wang, Lin-Fa; Field, Hume; Boyd, Victoria; Todd, Shawn; Kurth, Andreas; Marsh, Glenn A.; Kohl, Claudia; Monaghan, Paul; Crameri, Gary (2018). List of cell lines used in this study and summary of results for each cell line. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000703920
    Explore at:
    Dataset updated
    Feb 1, 2018
    Authors
    Tachedjian, Mary; Smith, Ina; Wang, Lin-Fa; Field, Hume; Boyd, Victoria; Todd, Shawn; Kurth, Andreas; Marsh, Glenn A.; Kohl, Claudia; Monaghan, Paul; Crameri, Gary
    Description

    List of cell lines used in this study and summary of results for each cell line.

  16. Investigation of Cross-Contamination and Misidentification of 278 Widely...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaqing Huang; Yuehong Liu; Congyi Zheng; Chao Shen (2023). Investigation of Cross-Contamination and Misidentification of 278 Widely Used Tumor Cell Lines [Dataset]. http://doi.org/10.1371/journal.pone.0170384
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yaqing Huang; Yuehong Liu; Congyi Zheng; Chao Shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, biological research involving human cell lines has been rapidly developing in China. However, some of the cell lines are not authenticated before use. Therefore, misidentified and/or cross-contaminated cell lines are unfortunately commonplace. In this study, we present a comprehensive investigation of cross-contamination and misidentification for a panel of 278 cell lines from 28 institutes in China by using short tandem repeat profiling method. By comparing the DNA profiles with the cell bank databases of ATCC and DSMZ, a total of 46.0% (128/278) cases with cross-contamination/misidentification were uncovered coming from 22 institutes. Notably, 73.2% (52 out of 71) of the cell lines established by the Chinese researchers were misidentified and accounted for 40.6% of total misidentification (52/128). Further, 67.3% (35/52) of the misidentified cell lines established in laboratories of China were HeLa cells or a possible hybrid of HeLa with another kind of cell line. Furthermore, the bile duct cancer cell line HCCC-9810 and degenerative lung cancer Calu-6 exhibited 88.9% match in the ATCC database (9-loci), indicating that they were from the same origin. However, when we used 21-loci to compare these two cell lines with the same algorithm, the percent match was only 48.2%, indicating that these two cell lines were different. The SNP profiles of HCCC-9810 and Calu-6 also revealed that they were different cell lines. 150 cell lines with unique profiles demonstrated a wide range of in vitro phenotypes. This panel of 150 genomically validated cancer cell lines represents a valuable resource for the cancer research community and will advance our understanding of the disease by providing a standard reference for cell lines that can be used for biological as well as preclinical studies.

  17. b

    Genomics of Drug Sensitivity in Cancer (GDSC)

    • bigomics.ch
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wellcome Sanger Institute (2024). Genomics of Drug Sensitivity in Cancer (GDSC) [Dataset]. https://bigomics.ch/blog/top-databases-for-drug-discovery/
    Explore at:
    Dataset updated
    Nov 8, 2024
    Dataset authored and provided by
    Wellcome Sanger Institute
    Description

    A dataset containing drug response profiles for over 600 compounds across multiple cancer cell lines.

  18. Human Gene Expression Database Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Human Gene Expression Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/human-gene-expression-database-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.

  19. e

    Analysis of lung cancer cell lines GLC1 and GLC1 M13

    • ebi.ac.uk
    • data.niaid.nih.gov
    Updated Jul 26, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gereon Poschmann (2013). Analysis of lung cancer cell lines GLC1 and GLC1 M13 [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD000132
    Explore at:
    Dataset updated
    Jul 26, 2013
    Authors
    Gereon Poschmann
    Variables measured
    Proteomics
    Description

    Datasets contain protein identification and raw data from MALDI protein identifications. Identified proteins resulted from a comparison of GLC1 and GLC1 SCLC cell lines using 2D DIGE. MALDI-TOF-MS analyses were performed on an UltraFlexTM II (Bruker Daltonics) instrument according to the instructions of the manufacturer. The instrument was equipped with a scoutTM MTP MALDI target. The spectra were acquired in the positive ion mode according to the settings given by the manufacturer. For external calibration, a peptide standard (m/z 757.399, 1296.684, 1619.822, 2093.086 and 3147.471) was used. The MALDI-PMF spectra were processed using the FlexAnalysis™ 2.4 software (Bruker Daltonics) and converted in the .xml format. For peak detection, the spectra were subjected to an internal recalibration using 13 different monoisotopic masses from autolysis products of trypsin and fragments of keratins ranging from m/z 842.509 – 2825.406. Following parameters were applied: snap peak detection algorithm, signal to noise threshold of 6, maximal number of peaks 100, quality factor threshold 50 and baseline subtraction TopHat. The generated mass lists were subsequently sent to ProteinScapeTM 1.3 (Bruker Daltonics, Bremen, Germany), triggering database searches using ProFound (Version 2002.03.01, Proteometrics LLC) and MASCOT (Version 2.3.02, Matrix Science, London, UK). The following search parameters were selected: fixed cysteine modification with propionamide, variable modification due to methionine oxidation, one maximal missed cleavage sites in case of incomplete trypsin hydrolysis and no details about 2-DE derived protein mass and pI. Using the Score booster function of ProteinScapeTM the mass lists were recalibrated and background masses removed using a list containing 44 masses occurring in a minimum of 10% of generated peak lists. The database searches were run with a mass tolerance of 40 ppm using UniProt’s human complete proteome set (downloaded on 26.10.2012) containing 68.109 protein entries. The used database is a composite database consisting of the UniProtKB entries and a duplicate of the same database, in which the amino acid sequence of each protein entry was randomly shuffled. Proteins reaching Profound score > 1.5 or Mascot score > 64 were considered as identified. Using these criteria one decoy database entry was found by the search engines indicating high confidence of protein identifications. If several database entries of homologues proteins matched these criteria only the entry with the highest score was reported.

  20. d

    COLT-Cancer

    • dknet.org
    • neuinfo.org
    Updated Aug 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). COLT-Cancer [Dataset]. http://identifiers.org/RRID:SCR_006485
    Explore at:
    Dataset updated
    Aug 18, 2025
    Description

    The COLT-Cancer database is a collection of shRNA dropout signatures profiles, covering ~16000 human genes, and derived from more than 70 Pancreatic, Ovarian and Breast human cancer cell-lines using the microarray detection platform developed in the COLT (CCBR-OICR Lentiviral Technology) facility at the Moffat Lab. All shRNA dropout profiles are freely available through download or queries via this website.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bairoch Amos (2018). Cellosaurus [Dataset]. http://identifiers.org/MIR:00000598

Cellosaurus

Related Article
Explore at:
445 scholarly articles cite this dataset (View in Google Scholar)
obo, xml, text/markdownAvailable download formats
Dataset updated
Nov 23, 2018
Dataset provided by
SIB Swiss Institute of Bioinformatics
Authors
Bairoch Amos
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
1943 - Present
Description

Cellosaurus is a knowledge resource on cell lines.

Search
Clear search
Close search
Google apps
Main menu