100+ datasets found

s
Cellosaurus
scicrunch.org
neuinfo.org
+2more
Updated May 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Cellosaurus [Dataset]. http://identifiers.org/RRID:SCR_013869
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013869
Dataset updated
May 6, 2015
Description
Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.
Data and metadata supporting the published article: Development and...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Ethier; Stephen T. Guest; Elizabeth Garrett-Mayer; Kent Armeson; Robert C. Wilson; Kathryn Duchinski; Daniel Couch; Joe W. Gray; Chistiana Kappler (2023). Data and metadata supporting the published article: Development and implementation of the SUM breast cancer cell line functional genomics knowledge base. [Dataset]. http://doi.org/10.6084/m9.figshare.12497630.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12497630.v1
Dataset updated
Jun 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stephen Ethier; Stephen T. Guest; Elizabeth Garrett-Mayer; Kent Armeson; Robert C. Wilson; Kathryn Duchinski; Daniel Couch; Joe W. Gray; Chistiana Kappler
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The SUM human breast cancer cell lines have been used by many labs around the world to develop extensive data sets derived from comparative genomic hybridization analysis, gene expression profiling, whole exome sequencing, and reverse phase protein array analysis. In a previous study, the authors of this paper performed genome-scale shRNA essentiality screens on the entire SUM line panel, as well as on MCF10A cells, MCF-7 cells, and MCF-7LTED cells. In this study, the authors have developed the SUM Breast Cancer Cell Line Knowledge Base, to make all of these omics data sets available to users of the SUM lines, and to allow users to mine the data and analyse them with respect to biological pathways enriched by the data in each cell line.Data access: All the datasets supporting the findings of this study are publicly available in the SLKBase platform here: https://sumlineknowledgebase.com/. RPPA data, drug sensitivity data, apelisib response data, and data on dose response, are also part of this figshare data record (https://doi.org/10.6084/m9.figshare.12497630).Study aims and methodology: This web-based knowledge base provides users with data and information on the derivation of each of the cell lines, provides narrative summaries of the genomics and cell biology of each breast cancer cell line, and provides protocols for the proper maintenance of the cells. The database includes a series of data mining tools that allow rapid identification of the functional oncogene signatures for each line, the enrichment of any KEGG pathway with screen hit and gene expression data for each of the lines, and a rapid analysis of protein and phospho-protein expression for the cell lines. A gene search tool that returns all of the functional genome and functional druggable data for any gene for the entire cell line panel, is included. Additionally, the authors have expanded the database to include functional genomic data for an additional 29 commonly used breast cancer cell lines. The three overarching goals in the original development of the SLKBase are: 1) to provide a rich source of information for anyone working with any of the SUM breast cancer cell lines, 2) to give researchers ready access to the large genomic data sets that have been developed with these cells, and 3) to allow researchers to perform orthogonal analyses of the various genomics data sets that we and others have obtained from the SUM lines. For more information on the development and contents of the database, please read the related article.Datasets supporting the paper:The data mining tools accessed the following datasets to generate the figures and tables, and these datasets are downloadable from the Data Download centre on the SLKBase: Exome sequencing data: SLKBase.exome_.seq_.sum_.xlsxGene amplification and expression data for the SUM cell lines: SUM44amplificationdata.xlsSUM52.xlsSUM149.xlsSUM159.xlsSUM185.xlsSUM190.xlsSUM225.xlsSUM229.xlsSUM1315.xlsCellecta shRNA screen data for the SUM cell lines:SUM44Celectadata.csvSUM52Cellectadata.csvSUM102Cellectadata.csvSUM149Cellectadata.csvSUM159Cellectadata.csvSUM185Cellectadata.csvSUM190Cellectadata.csvSUM225Cellectadata.csvSUM229Cellectadata.csvSUM1315hits.hit.csvMCF10A.hits_.csvBreast cancer cell line data included in this data record (these datasets were used to generate figures 1, 2 and 7 in the article):Proteomics data from the Reverse Phase Protein Array (RPPA) assay analysis: Ethier.SUMline.RPPA.xlsxDrug sensitivity data: NAVITOCLAX.drugsensitivity.Zscores.xlsxApelisib response data: Apelisib all lines (2).xlsxDose response data: 092614 Dose Response CP 52s.11.15.xlsxAll the files are either in .xlsx or .csv file format.
b
Cell Line Database
bioregistry.io
Updated Dec 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Cell Line Database [Dataset]. https://bioregistry.io/cldb
Explore at:
Dataset updated
Dec 28, 2021
Description
The Cell Line Data Base (CLDB) is a reference information source for human and animal cell lines. It provides the characteristics of the cell lines and their availability through distributors, allowing cell line requests to be made from collections and laboratories.
m
CCLE Cell Line Gene Expression Profiles
maayanlab.cloud
gz
Updated Apr 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma'ayan Laboratory of Computational Systems Biology (2015). CCLE Cell Line Gene Expression Profiles [Dataset]. https://maayanlab.cloud/Harmonizome/dataset/CCLE+Cell+Line+Gene+Expression+Profiles
Explore at:
gzAvailable download formats
Dataset updated
Apr 6, 2015
Dataset provided by
Harmonizome
Ma'ayan Laboratory of Computational Systems Biology
Authors
Ma'ayan Laboratory of Computational Systems Biology
Description
mRNA microarray expression profiles for cancer cell lines
b
NCI-60 Cancer Cell Lines
bigomics.ch
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (NCI) (2024). NCI-60 Cancer Cell Lines [Dataset]. https://bigomics.ch/blog/top-databases-for-drug-discovery/
Explore at:
Dataset updated
Nov 8, 2024
Dataset authored and provided by
National Cancer Institute (NCI)
Description
A panel of 60 human cancer cell lines used for screening anticancer drugs.
M
RNA sequencing data for 30 bladder cancer cell lines
datacatalog.mskcc.org
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee, I-Ling; McConkey, David J.; Su, Xiaoping; Choi, Woonyoung (2019). RNA sequencing data for 30 bladder cancer cell lines [Dataset]. https://datacatalog.mskcc.org/dataset/10401
Explore at:
Dataset updated
Nov 18, 2019
Authors
Lee, I-Ling; McConkey, David J.; Su, Xiaoping; Choi, Woonyoung
Description
Summary from the GEO: "RNA-sequencing of a panel of urothelial cancer cells. The goal of the study is to examine the genome-wide expression profile in each of the 30 urothelial cancer cells tested in our laboratory."

"Overall design: Each of the 30 cell lines was DNA fingerprinted to confirm its real identity. Total RNA was obtained from each cell line and subjected to Illumina RNA sequencing."

The data was from a study on comprehensive molecular characterization of muscle-invasive bladder cancer.
Investigation of Cross-Contamination and Misidentification of 278 Widely...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaqing Huang; Yuehong Liu; Congyi Zheng; Chao Shen (2023). Investigation of Cross-Contamination and Misidentification of 278 Widely Used Tumor Cell Lines [Dataset]. http://doi.org/10.1371/journal.pone.0170384
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0170384
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yaqing Huang; Yuehong Liu; Congyi Zheng; Chao Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, biological research involving human cell lines has been rapidly developing in China. However, some of the cell lines are not authenticated before use. Therefore, misidentified and/or cross-contaminated cell lines are unfortunately commonplace. In this study, we present a comprehensive investigation of cross-contamination and misidentification for a panel of 278 cell lines from 28 institutes in China by using short tandem repeat profiling method. By comparing the DNA profiles with the cell bank databases of ATCC and DSMZ, a total of 46.0% (128/278) cases with cross-contamination/misidentification were uncovered coming from 22 institutes. Notably, 73.2% (52 out of 71) of the cell lines established by the Chinese researchers were misidentified and accounted for 40.6% of total misidentification (52/128). Further, 67.3% (35/52) of the misidentified cell lines established in laboratories of China were HeLa cells or a possible hybrid of HeLa with another kind of cell line. Furthermore, the bile duct cancer cell line HCCC-9810 and degenerative lung cancer Calu-6 exhibited 88.9% match in the ATCC database (9-loci), indicating that they were from the same origin. However, when we used 21-loci to compare these two cell lines with the same algorithm, the percent match was only 48.2%, indicating that these two cell lines were different. The SNP profiles of HCCC-9810 and Calu-6 also revealed that they were different cell lines. 150 cell lines with unique profiles demonstrated a wide range of in vitro phenotypes. This panel of 150 genomically validated cancer cell lines represents a valuable resource for the cancer research community and will advance our understanding of the disease by providing a standard reference for cell lines that can be used for biological as well as preclinical studies.
n
ATCC STR database
neuinfo.org
scicrunch.org
+2more
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ATCC STR database [Dataset]. http://identifiers.org/RRID:SCR_019203
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_019203
Dataset updated
Apr 28, 2021
Description
Comprehensive database of Short Tandem Repeat DNA profiles for all of ATCC human cell lines. ATCC data collection as part of continuing efforts to characterize and authenticate cell lines in Cell Biology collection.
r
International Cell Line Authentication Committee
rrid.site
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). International Cell Line Authentication Committee [Dataset]. http://identifiers.org/RRID:SCR_014414
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014414
Dataset updated
Jul 3, 2024
Description
An independent committee established to improve visibility of cell lines and promote awareness and authentication testing to combat false or misidentified cell lines. It contains a databases of cross-contaminated or otherwise misidentified cell lines, as well as resources to familiarize users of cell lines and the problem of misidentification. Their Terms of Reference defines false or misidentified cell lines and other commonly used terms, as well as sets out the committee goals and ground rules.
h
hPSCreg dataset, continuously updated
hpscreg.eu
Updated Jul 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). hPSCreg dataset, continuously updated [Dataset]. https://hpscreg.eu/
Explore at:
Dataset updated
Jul 15, 2015
Variables measured
usage, ethics, derivation, genotyping, characterisation, donor information, culture conditions, general information, genetic modification
Description
hPSCreg is a global registry of human pluripotent stem cell (hPSC) lines containing manually validated information, including ethical provenance, procurement, derivation process, genetic and expression data, other biological and molecular characteristics, use, and quality of the line — Current status: 1123 hESC lines, 7670 hiPSC lines, and 205 clinical studies, and 2402 certificates
Genomics of Drug Sensitivity in Cancer (GDSC)
kaggle.com
zip
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samira Alipour (2024). Genomics of Drug Sensitivity in Cancer (GDSC) [Dataset]. https://www.kaggle.com/datasets/samiraalipour/genomics-of-drug-sensitivity-in-cancer-gdsc/discussion
Explore at:
zip(15094344 bytes)Available download formats
Dataset updated
Aug 13, 2024
Authors
Samira Alipour
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
The Genomics of Drug Sensitivity in Cancer (GDSC) dataset is a valuable resource for therapeutic biomarker discovery in cancer research. This dataset combines drug response data with genomic profiles of cancer cell lines, allowing researchers to investigate the relationship between genetic features and drug sensitivity.

Task:

The primary task associated with this dataset is to predict drug sensitivity (measured as IC50 values) based on genomic features of cancer cell lines. This can involve regression tasks to predict exact IC50 values or classification tasks to categorize cell lines as sensitive or resistant to specific drugs. The dataset also allows for the identification of genomic markers that correlate with drug response.

Files:

GDSC2-dataset.csv: Contains drug sensitivity data, including IC50 values, for various drugs tested against cancer cell lines.(Original source file)

Cell_Lines_Details.xlsx: Provides detailed information about the cancer cell lines, including genomic features such as mutations, copy number alterations, and gene expression. (Original source file)

Compounds-annotation.csv: Offers information about the drugs used in the screening, including their targets and pathways. (Original source file)

GDSC_DATASET.csv: This is the main dataset file for analysis. It's a merged file combining key information from the above three files, created to facilitate easier analysis. This consolidated dataset includes all necessary features for drug sensitivity prediction and is recommended for use in your analysis.

Detailed Column Descriptions:

1. GDSC2-dataset.csv:

DATASET: Identifier for the specific GDSC dataset version.

NLME_RESULT_ID: Unique identifier for the non-linear mixed effects model result.

NLME_CURVE_ID: Identifier for the dose-response curve fitted by NLME.

COSMIC_ID: Unique identifier for the cell line from the COSMIC database.

CELL_LINE_NAME: Name of the cancer cell line used in the experiment.

SANGER_MODEL_ID: Identifier used by the Sanger Institute for the cell line model.

TCGA_DESC: Description of the cancer type according to The Cancer Genome Atlas.

DRUG_ID: Unique identifier for the drug used in the experiment.

DRUG_NAME: Name of the drug used in the experiment.

PUTATIVE_TARGET: The presumed molecular target of the drug.

PATHWAY_NAME: The biological pathway affected by the drug.

COMPANY_ID: Identifier for the company that provided the drug.

WEBRELEASE: Date or version of web release for this data.

MIN_CONC: Minimum concentration of the drug used in the experiment.

MAX_CONC: Maximum concentration of the drug used in the experiment.

LN_IC50: Natural log of the half-maximal inhibitory concentration (IC50).

AUC: Area Under the Curve, a measure of drug effectiveness.

RMSE: Root Mean Square Error, indicating the fit quality of the dose-response curve.

Z_SCORE: Standardized score of the drug response, allowing comparison across different drugs and cell lines. ### 2. Cell_Lines_Details.xlsx:

Sample Name: Unique identifier for the cell line sample.

COSMIC identifier: Unique ID from the COSMIC database for the cell line.

Whole Exome Sequencing (WES): Genetic mutation data from whole exome sequencing.

Copy Number Alterations (CNA): Data on gene copy number changes in the cell line.

Gene Expression: Information on gene expression levels in the cell line.

Methylation: Data on DNA methylation patterns in the cell line.

Drug Response: Information on how the cell line responds to various drugs.

GDSC Tissue descriptor 1: Primary tissue type classification.

GDSC Tissue descriptor 2: Secondary tissue type classification.

Cancer Type (matching TCGA label): Cancer type according to TCGA classification.

Microsatellite instability Status (MSI): Indicates the cell line's MSI status.

Screen Medium: The growth medium used for culturing the cell line.

Growth Properties: Characteristics of how the cell line grows in culture. ### 3. Compounds-annotation.csv:

DRUG_ID: Unique identifier for the drug.

SCREENING_SITE: Location where the drug screening was performed.

DRUG_NAME: Name of the drug compound.

SYNONYMS: Alternative names for the drug.

TARGET: The molecular target(s) of the drug.

TARGET_PATHWAY: The biological pathway(s) targeted by the drug.

Target Variable:

The primary target variable in this dataset is LN_IC50 (Natural log of the half-maximal inhibitory concentration). This variable represents the concentration of a drug that inhibits cell viability by 50%, measured on a logarithmic scale. Lower LN_IC50 values indicate higher drug sensitivity, making it a crucial metric for evaluating the effectiveness of anti-ca...
NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67...
datacatalog.mskcc.org
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Cancer Institute (U.S.) (2020). NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67 samples by NCI-60 cell line project [Dataset]. https://datacatalog.mskcc.org/dataset/10453
Explore at:
Dataset updated
May 20, 2020
Dataset provided by
National Cancer Institutehttp://www.cancer.gov/
MSK Library
Description
This dataset contains summary data visualizations and clinical data from 67 samples from 67 patients as part an NCI-60 cell line project to compile NCI-60 cell line high-throughput and high-content data into CellMiner, a genomic and pharmacologic database created by the National Cancer Center Institute. The clinical data includes deidentified patient and sample IDs, mutation counts, detailed cancer type information, patient demographics, and past modality. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
f
Careful Selection of Reference Genes Is Required for Reliable Performance of...
figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francis Jacob; Rea Guertler; Stephanie Naim; Sheri Nixdorf; André Fedier; Neville F. Hacker; Viola Heinzelmann-Schwarz (2016). Careful Selection of Reference Genes Is Required for Reliable Performance of RT-qPCR in Human Normal and Cancer Cell Lines [Dataset]. http://doi.org/10.1371/journal.pone.0059180
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0059180
Dataset updated
Jan 18, 2016
Dataset provided by
PLOS ONE
Authors
Francis Jacob; Rea Guertler; Stephanie Naim; Sheri Nixdorf; André Fedier; Neville F. Hacker; Viola Heinzelmann-Schwarz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reverse Transcription - quantitative Polymerase Chain Reaction (RT-qPCR) is a standard technique in most laboratories. The selection of reference genes is essential for data normalization and the selection of suitable reference genes remains critical. Our aim was to 1) review the literature since implementation of the MIQE guidelines in order to identify the degree of acceptance; 2) compare various algorithms in their expression stability; 3) identify a set of suitable and most reliable reference genes for a variety of human cancer cell lines. A PubMed database review was performed and publications since 2009 were selected. Twelve putative reference genes were profiled in normal and various cancer cell lines (n = 25) using 2-step RT-qPCR. Investigated reference genes were ranked according to their expression stability by five algorithms (geNorm, Normfinder, BestKeeper, comparative ΔCt, and RefFinder). Our review revealed 37 publications, with two thirds patient samples and one third cell lines. qPCR efficiency was given in 68.4% of all publications, but only 28.9% of all studies provided RNA/cDNA amount and standard curves. GeNorm and Normfinder algorithms were used in 60.5% in combination. In our selection of 25 cancer cell lines, we identified HSPCB, RRN18S, and RPS13 as the most stable expressed reference genes. In the subset of ovarian cancer cell lines, the reference genes were PPIA, RPS13 and SDHA, clearly demonstrating the necessity to select genes depending on the research focus. Moreover, a cohort of at least three suitable reference genes needs to be established in advance to the experiments, according to the guidelines. For establishing a set of reference genes for gene normalization we recommend the use of ideally three reference genes selected by at least three stability algorithms. The unfortunate lack of compliance to the MIQE guidelines reflects that these need to be further established in the research community.
d
Integrated Cell Lines
dknet.org
neuinfo.org
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Integrated Cell Lines [Dataset]. http://identifiers.org/RRID:SCR_008994
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008994
Dataset updated
Jan 29, 2022
Description
A virtual database currently indexing available cell lines from: Coriell Cell Repositories, International Mouse Strain Resource (IMSR), ATCC, NIH Human Pluripotent Stem Cell Registry, NIGMS Human Genetic Cell Repository, and Developmental Therapeutics Program.
r
Cancer Cell Line Encyclopedia
rrid.site
Updated Aug 21, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). Cancer Cell Line Encyclopedia [Dataset]. http://identifiers.org/RRID:SCR_013836
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013836
Dataset updated
Aug 21, 2010
Description
A collaborative project between the Broad Institute and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation, with the goal of conducting a detailed genetic and pharmacologic characterization of a large panel of human cancer models. The CCLE also works to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines.
Pharmacogenomics Datasets for Cancer Cell Lines from CellMiner...
zenodo.org
application/gzip
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Augustin Luna; Augustin Luna; Fathi Elloumi; Fathi Elloumi; Vinodh Rajapakse; Vinodh Rajapakse (2025). Pharmacogenomics Datasets for Cancer Cell Lines from CellMiner Cross-Database (CellMinerCDB) [Dataset]. http://doi.org/10.5281/zenodo.17088217
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17088217
Dataset updated
Sep 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Augustin Luna; Augustin Luna; Fathi Elloumi; Fathi Elloumi; Vinodh Rajapakse; Vinodh Rajapakse
License
https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
Time period covered
Sep 2025
Description
If you use this data, please cite: Luna A, Elloumi F, Varma S et al. NAR. 2021. PMID: 33196823

Cell line pharmacogenomics datasets for cancer biology and machine learning studies. The datasets are compatible with rcellminer and CellMinerCDB (see publications for details) and data can be extracted for use with Python-based projects.

An example for extracting data from the rcellminer and CellMinerCDB compatible packages:

# INSTALL ---- if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("rcellminer") # Replace path_to_file with the data package filename install.packages(path_to_file, repos = NULL, type="source") # GET DATA ---- ## Replace nciSarcomaData with name of dataset through code library(nciSarcomaData) ## DRUG DATA ---- drugAct <- exprs(getAct(nciSarcomaData::drugData)) drugAnnot <- getFeatureAnnot(nciSarcomaData::drugData)[["drug"]] ## MOLECULAR DATA ---- ### List available datasets names(getAllFeatureData(nciSarcomaData::molData)) ### Extract data and annotations expData <- exprs(nciSarcomaData::molData[["exp"]]) mirData <- exprs(nciSarcomaData::molData[["mir"]]) expAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["exp"]] mirAnnot <- getFeatureAnnot(nciSarcomaData::molData)[["mir"]] ## SAMPLE DATA ---- sampleAnnot <- getSampleData(nciSarcomaData::molData)
p
Human Protein Atlas - Cell Atlas
v19.proteinatlas.org
Updated Sep 5, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human Protein Atlas (2019). Human Protein Atlas - Cell Atlas [Dataset]. https://v19.proteinatlas.org/humanproteome/cell
Explore at:
Dataset updated
Sep 5, 2019
Dataset provided by
Human Protein Atlas
License
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Description
The Cell Atlas provides high-resolution insights into the expression and spatio-temporal distribution of proteins within human cells. Using a panel of 64 cell lines to represent various cell populations in different organs and tissues of the human body, the mRNA expression of all human genes are characterized by deep RNA-sequencing. The subcellular distribution of each protein is investigated in a subset of cell lines selected based on corresponding gene expression. The protein localization data is derived from antibody-based profiling by immunofluorescence confocal microscopy, and classified into 32 different organelles and fine subcellular structures. The Cell Atlas currently covers 12390 genes (63%) for which there are available antibodies. It offers a database for exploring details of individual genes and proteins of interest, as well as systematically analyzing transcriptomes and proteomes in broader contexts, in order to increase our understanding of human cells.
p
Human Protein Atlas - Subcellular
proteinatlas.org
Updated Sep 26, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human Protein Atlas (2008). Human Protein Atlas - Subcellular [Dataset]. https://www.proteinatlas.org/humanproteome/subcellular
Explore at:
Dataset updated
Sep 26, 2008
Dataset authored and provided by
Human Protein Atlas
License
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Description
Subcellular methods

The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13603 genes (67% of the human protein-coding genes), as well as predictions for an additional 3459 secreted- or membrane proteins, covering a total of 17062 genes (85% of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 42 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines, induced pluripotent stem cells (iPSCs) and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.

The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:

The subcellular distribution of proteins in standard human cell lines, including ciliated cells and iPSCs. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
Additional file 6: of A map of gene expression in neutrophil-like cell lines...
springernature.figshare.com
search.datacite.org
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esther Rincรณn; Briana Rocha-Gregg; Sean Collins (2023). Additional file 6: of A map of gene expression in neutrophil-like cell lines [Dataset]. http://doi.org/10.6084/m9.figshare.6891509.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6891509.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Esther Rincรณn; Briana Rocha-Gregg; Sean Collins
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full tables of reanalyzed gene expression data for primary neutrophils and HL-60 cells from previously published studies. This Excel file contains four sheets. The first sheet contains FPKM gene expression values generated by Cufflinks for all primary human neutrophil and HL-60 samples reanalyzed in this study. The second sheet contains the corresponding log10-transformed normalized expression values. The third sheet contains FPKM gene expression values for all primary mouse neutrophil samples reanalyzed in this study, and the fourth sheet contains the corresponding log10-transformed normalized values. (XLSX 21707 kb)
n
SBM DB
neuinfo.org
dknet.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). SBM DB [Dataset]. http://identifiers.org/RRID:SCR_013491/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013491 https://identifiers.org/RRID:SCR_013491/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
It is a comprehensive database of Gene Expression Profiles, which enable to compare the transcriptome of various tissues, organs and experiments. mRNA expression levels of thousands of genes are measured with oligo-nucleotide DNA microarray "GeneChip". All gene expression data in this database is produced by LSBM (Laboratory for Systems Biology and Medicine) and the collaborators. SBM DB provides two different databases: A reference database for fur expression analysis (RefEXA) and LSMB GeNet, a database of various organisms, tissues, and experiences. RefEXA provides a comprehensive gene expression database of Human normal tissues, normal cultured cells and cancer cell lines with GeneChip HG-U133A, can help investigation of Human disease. LSMB provides

Facebook

Twitter

Click to copy link

Link copied

Cite

(2015). Cellosaurus [Dataset]. http://identifiers.org/RRID:SCR_013869

Cellosaurus

RRID:SCR_013869, nif-0000-30108, r3d100010875, Cellosaurus (RRID:SCR_013869)

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_013869

Dataset updated

May 6, 2015

Description

Database of all cell lines used in biomedical research which include immortalized cell lines, naturally immortal cell lines (stem cells), widely used and distributed finite life cell lines, vertebrate cell lines (majority being human, mouse, and rat), and invertebrate (insects and ticks) cell lines, as well as cell line synonyms. Each cell line is provided with the following information: the recommended name (the name which appears in the original publication), a list of synonyms, a unique accession number, comments on a number of topics including misspellings and gene transfection, information on the tissue/organ origin with the UBERON code, the NCI Thesaurus or Orphanet ORDO code for the disease(s) the individual suffered from (for cancer and human genetic disease lines only), the species of origin, the parent cell line, cross-references of sister cell lines, the sex of the individual, the category in which the cell line belongs (Adult stem cell; Cancer cell line; Embryonic stem cell; Factor-dependent cell line; Finite cell line; Hybrid cell line; Hybridoma; Induced pluripotent stem cell; Spontaneously immortalized cell line; Stromal cell line; Telomerase immortalized cell line; Transformed cell line; Undefined cell line type), web links, publication references, and/or cross-references to cell line catalogs/collections, ontologies, cell lines databases/resources, and to databases that list cell lines as samples.

Clear search

Close search

Google apps

Main menu

Cellosaurus

Data and metadata supporting the published article: Development and...

Cell Line Database

CCLE Cell Line Gene Expression Profiles

NCI-60 Cancer Cell Lines

RNA sequencing data for 30 bladder cancer cell lines

Investigation of Cross-Contamination and Misidentification of 278 Widely...

ATCC STR database

International Cell Line Authentication Committee

hPSCreg dataset, continuously updated

Genomics of Drug Sensitivity in Cancer (GDSC)

Task:

Files:

Detailed Column Descriptions:

1. GDSC2-dataset.csv:

Target Variable:

NCI-60 Cell Lines (NCI, Cancer Res 2012): Whole-exome sequencing of 67...

Careful Selection of Reference Genes Is Required for Reliable Performance of...

Integrated Cell Lines

Cancer Cell Line Encyclopedia

Pharmacogenomics Datasets for Cancer Cell Lines from CellMiner...

Human Protein Atlas - Cell Atlas

Human Protein Atlas - Subcellular

Additional file 6: of A map of gene expression in neutrophil-like cell lines...

SBM DB

Cellosaurus

RRID:SCR_013869, nif-0000-30108, r3d100010875, Cellosaurus (RRID:SCR_013869)