71 datasets found
  1. Data from: scRNA-seq Datasets

    • figshare.com
    txt
    Updated Apr 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhengtao Xiao (2019). scRNA-seq Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.7174922.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 9, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zhengtao Xiao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    "*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.

  2. Single-cell datasets for temporal gene expression integration

    • zenodo.org
    bin
    Updated Aug 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jolene Ranek; Natalie Stanley; Jeremy Purvis; Jolene Ranek; Natalie Stanley; Jeremy Purvis (2022). Single-cell datasets for temporal gene expression integration [Dataset]. http://doi.org/10.5281/zenodo.6587903
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 12, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jolene Ranek; Natalie Stanley; Jeremy Purvis; Jolene Ranek; Natalie Stanley; Jeremy Purvis
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Contains loom files and preprocessed adata objects to compare methods for temporal gene expression integration. Loom files can be accessed using the 'read' function in Scvelo. Preprocessed adata objects can be accessed using the 'read_h5ad' function in Scanpy.

    The raw single-cell RNA sequencing datasets can be found under the following accession codes.

  3. n

    Data from: Large-scale integration of single-cell transcriptomic data...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Cornell University
    Authors
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

    Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

    Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

    Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

    Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

    Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

    Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

    Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

    Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

    Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

  4. l

    cellCounts

    • opal.latrobe.edu.au
    • researchdata.edu.au
    bin
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    La Trobe
    Authors
    Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

    R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

    Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.

  5. Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. http://doi.org/10.5281/zenodo.10011622
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop
    Description

    Table of Contents

    1. Main Description
    2. File Descriptions
    3. Linked Files
    4. Installation and Instructions

    1. Main Description

    ---------------------------

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.

    The following libraries are required for script execution:

    • Seurat
    • scReportoire
    • ggplot2
    • stringr
    • dplyr
    • ggridges
    • ggrepel
    • ComplexHeatmap

    File Descriptions

    ---------------------------

    • The code can be downloaded and opened in RStudios.
    • The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper
    • The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113).
    • The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots.
    • The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    ---------------------

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    • Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
    • Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data.
    • Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    • Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
    • Description: This submission contains the **raw sequencing** or `.fastq.gz` files, which are tab delimited text files.
    • Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    • Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.
    • Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code.
    • Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    --------------------------------------

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    > Ensure you have R version 4.1.2 or higher for compatibility.

    > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

    3. Set your working directory to where the following files are located:

    • marengo_code_for_paper_jan_2023.R
    • Install_Packages.R
    • Marengo_newID_March242023.rds
    • genes_for_heatmap_fig5F.xlsx
    • all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    > setwd(directory)

    4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

    5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

    6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

    7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.

  6. o

    Bulk RNA-Seq Deconvolution with single-cell RNA-Seq Datasets

    • explore.openaire.eu
    Updated Oct 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wendi Bacon; Mehmet Tekman (2021). Bulk RNA-Seq Deconvolution with single-cell RNA-Seq Datasets [Dataset]. http://doi.org/10.5281/zenodo.5719228
    Explore at:
    Dataset updated
    Oct 6, 2021
    Authors
    Wendi Bacon; Mehmet Tekman
    Description

    Bulk data of human pancreas The dataset from Fadista et al. (2014) contains raw read counts data from bulk RNA-seq of human pancreatic islets to study glucose metabolism in healthy and hyper-hypoglycemic conditions. For the purpose of this vignette, the dataset is pre-processed and made available on the data download page. In addition to read counts, this dataset also contains HbA1c levels, BMI, gender and age information for each subject. Single Cell Data of Human Pancreas The single cell data are from Segerstolpe et al. (2016), which constrains read counts for 25453 genes across 2209 cells. Here we only include the 1097 cells from 6 healthy subjects. The read counts are available on the data download page, in the form of an ExpressionSet. Another single cell data is from Xin et al. (2016), which have 39849 genes and 1492 cells. The read counts are available on the data download page, in the form of an ExpressionSet. The deconvolution of 89 subjects from Fadista et al. (2014) are preformed with bulk data GSE50244.bulk.eset and single cell reference EMTAB.eset. We constrained our estimation on 6 major cell types: alpha, beta, delta, gamma, acinar and ductal, which make up over 90% of the whole islet.

  7. o

    Repository for the single cell RNA sequencing data analysis for the human...

    • explore.openaire.eu
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
    Explore at:
    Dataset updated
    Aug 26, 2023
    Authors
    Jonathan; Andrew; Pierre; Allart; Adrian
    Description

    This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.

  8. E

    Breast Cancer Single-Cell RNA-Seq Dataset

    • ega-archive.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Breast Cancer Single-Cell RNA-Seq Dataset [Dataset]. https://ega-archive.org/datasets/EGAD00001007495
    Explore at:
    License

    https://ega-archive.org/dacs/EGAC00001001974https://ega-archive.org/dacs/EGAC00001001974

    Description

    Single-cell RNA-Sequencing of 26 primary breast cancers from Wu et al. (2021) study. Data was generated using the Chromium controller (10X Genomics) and sequenced on the NextSeq 500 platform.

  9. Protocol data (R version)

    • figshare.com
    application/gzip
    Updated Oct 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jesse Gillis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.

  10. Z

    Processed, annotated, seurat object

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cenk Celik (2023). Processed, annotated, seurat object [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7608211
    Explore at:
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    Guillaume Thibault
    Cenk Celik
    Description

    The dataset contains an integrated, annotated Seurat v4 object. One can load the dataset into the R environment using the code below:

    seurat_obj <- readRDS('PATH/TO/DOWNLOAD/seurat.rds')

    The object has three assays: (I) RNA, (II) SCT and (III) integrated.

  11. c

    Alternate gene annotations for rat, macaque, and marmoset for single cell...

    • kilthub.cmu.edu
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BaDoi Phan; Andreas Pfenning (2023). Alternate gene annotations for rat, macaque, and marmoset for single cell RNA and ATAC analyses [Dataset]. http://doi.org/10.1184/R1/21176401.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Carnegie Mellon University
    Authors
    BaDoi Phan; Andreas Pfenning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Custom genome and gene annotations for single cell ATAC and RNA-seq analyses by BaDoi Phan (badoi dot phan at pitt dot edu)

    This Kilthub upload is a clone of the github repository where this project may be updated or corrected in the future: https://github.com/pfenninglab/custom_ArchR_genomes_and_annotations

    Premise: Not all of single-cell ATAC-seq biomedical molecular epigenetics is done in human and mouse genomes where there are high quality genomes and gene annotations. For the other species that are still highly relevant to study health and disease, here are some ArchR annotations to enable less frustration to have snATAC-seq data analyzed with ArchR.

    Strategy for better gene annotations: We can use the proper that evolution of related mammalian species tend to have orthologous gene elements (TSS, exons, genes). For example, house mouse (mus musculus) is a median of 15.4MY diverged from the Norway rat (rattus norvegicus), with TimeTree. Humans are a median of 28.9 MY diverged from rhesus macaques. To borrow the higher quality and more complete gene annotations, we can use a gene-aware method of lifting gene annotations from one genome to another, liftoff, Shumate and Salzberg, 2021. For the source of "high quality" gene annotation, we use the NCBI Refseq annotations from the hg38/GRCh38 and mm10/GRCm38 annotations downloaded from the UCSC Genome browser.

    For single cell RNA-seq, He, Kleyman et al. 2021 Current Biology (https://pubmed.ncbi.nlm.nih.gov/34727523/) found that using a regular liftOver of the human NCBI Refseq to rheMac10 was able to recover higher number of UMI counts to genes. This is likely due to incomplete annotations in either rheMac8 or rheMac10 genomes for the 3' UTRs that are usually targeted by common single cell/nucleus RNA-seq technologies. This allow more reads that would otherwise be found "outside" a gene because of incomplete 3' UTRs in a target species to be appropriately attributed to that gene using the orthologs of that gene from a more complete annotation in a related species. Furthermore, the complex splicing is better measured in humans, so more "intergenic" annotations by the rheMac10 annotations became "intronic" and better able to be mapped to a liftOvered annotation from human. For this reason, we create alternate annotations for the rhesus macaque, marmoset, and rat genomes borrowing orthology as identified with the newer liftoff method from more complete human or mouse annotations.

    Similarly, for single cell ATAC-seq seq, a more complete map of genes and transcription start sites (TSS) enable aggregate metrics like a "gene score" to better calculate gene-based measures to perform co-clustering with single cell RNA-seq dataset. A more complete annotation would be able to accurately discern single cell open chromatin regions and not falsely report exonic regions or alternate promoters that were missed from primary transcriptomic data in monkey, marmoset, or rat but can be bioinformatically inferred.

    Lastly, work by the ENCODE Consortium has found with the large human and mouse epigenomic data that certain regions of the genome in these species have artifactual signals and need to be excluded from epigenomic analsyes, Amemiya et al., 2021. These regions were pulled from and human and mouse from here and used the liftOver to map to the target genomes below, for simplicity.

    list of resources by file name Surprisingly, all these files are small enough to put on github for a couple custom genomes. Below are the organizations - *.gtf.gz and *.gff3.gz: the gzipped annotation from the higher quality annotations to the target genome using liftoff - *liftOver*blacklist.v2.bed: the ENCODE regions to exclude from epigenomic analyses mapped to the target genome using liftOver - *ArchRGenome.R: the Rscript used to make the custom ArchR annotations - *ArchR_annotations.rda: the R Data object that contains the geneAnnotation and objects to use with ArchR::createArrowFiles()

    list of species/genomes/source files For most of these files, the genome fasta sequences were grabbed from the UCSC Genome Browser at https://hgdownload.soe.ucsc.edu/goldenPath/${GENOME_VERSION}/, where ${GENOME_VERSION} is any of the version below except mCalJac1. Some of these genomes were updated from the Vertebrate Genome Project, which seeks to create complete rather than draft genome assemblies of all mammals on the planet, Rhie et al. 2021. These genomes have VGP and that naming version if there's an alternate naming scheme. The VGP is pretty cool and they make good genome assemblies.

  12. o

    Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD)

    • registry.opendata.aws
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allen Institute (2022). Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) [Dataset]. https://registry.opendata.aws/allen-sea-ad-atlas/
    Explore at:
    Dataset updated
    Sep 23, 2022
    Dataset provided by
    <a href="http://www.alleninstitute.org/">Allen Institute</a>
    Area covered
    Seattle
    Description

    The Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) consortium strives to gain a deep molecular and cellular understanding of the early pathogenesis of Alzheimer's disease and is funded by the National Institutes on Aging (NIA U19AG060909). The SEA-AD datasets available here comprise single cell profiling (transcriptomics and epigenomics) and quantitative neuropathology. To explore gene expression and chromatin accessibility information, the single-cell profiling data includes: snRNAseq and snATAC-seq data from the SEA-AD donor cohort (aged brains which span the spectrum of Alzheimer's Disease pathology) and neurotypical reference brains. To explore key pathological proteins and cell types of interest to Alzheimer's disease, the neuropathology data includes: full resolution brightfield images, images processed and segmented in HALO image analysis software, image annotations, and quantification summary files for the relevant stains including Abeta (6E10), IBA1, a-Synuclein, GFAP, H&E-LFB, NeuN, pTau(AT8), and pTDP43.

  13. Collected and curated data for research

    • figshare.com
    csv
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yujie He (2025). Collected and curated data for research [Dataset]. http://doi.org/10.6084/m9.figshare.28888880.v3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Yujie He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for research XXDOI:Notice: file of filter_pan_cancer can be download from 10.6084/m9.figshare.28889885

  14. h

    gene-expression-single-cell-mouse

    • huggingface.co
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    2025 Longevity x AI Hackathon (2025). gene-expression-single-cell-mouse [Dataset]. https://huggingface.co/datasets/longevity-db/gene-expression-single-cell-mouse
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    2025 Longevity x AI Hackathon
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A single-cell transcriptomic atlas characterizes ageing tissues in the mouse

    Code to download and process this dataset is available in: https://github.com/seanome/2025-longevity-x-ai-hackathon Dataset structure is originally from AnnData. Descriptions of each data file is below.

      Data Files
    

    This dataset contains multiple parquet files, one for each sheet in the original Excel file: gene-expression-single-cell-mouse_*.parquet - Data files containing gene expression and… See the full description on the dataset page: https://huggingface.co/datasets/longevity-db/gene-expression-single-cell-mouse.

  15. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
    Explore at:
    zip(23088130184 bytes)Available download formats
    Dataset updated
    Jun 26, 2021
    Authors
    Alexander
    Description

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  16. Single Cell RNAseq of Mouse Testis

    • zenodo.org
    bin, zip
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wells; Wells; Myers; Conrad; Jung; Myers; Conrad; Jung (2023). Single Cell RNAseq of Mouse Testis [Dataset]. http://doi.org/10.5281/zenodo.3233870
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wells; Wells; Myers; Conrad; Jung; Myers; Conrad; Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the processed data from the publication: "Unified single-cell analysis of testis gene regulation and pathology in 5 mouse strains" (https://doi.org/10.1101/393769)

    The raw data is avaliable at GEO: GSE113293

    Associated software is at https://zenodo.org/badge/latestdoi/140632831

    SDA_objects.zip contains key tables required for many functions, download this to use the shiny app. Contents:

    • cell_data: data.table containing metadata of the cells. 80 columns including cell id, SDA component cell scores, Tsne coordinates, pseudotime, experimental group etc.
    • data: a sparse matrix of normalised read counts (20322 cells by 19262 genes)
    • gene_annotations: data.table containing gene locations, enrichment vs other tissues from bulk data, infertility gene status
    • GO_enrich: data.table containing gene ontology enrichments for each component, the genes enriched, p.values and enrichment values
    • principal_curves: list of output of princurve() containing the pseudotime trajetories
    • SDAresults: output of SDA run loaded using SDAtools::load_results()

    Other R objects include:

    • QC_count_matrix: Sparse matrix of raw count values for QC cells and genes
    • cell_imputation_AUCs: data.table of PRAC AUC values for unimputed (train), mean cell, SDA (predict), ICA, PCA, NNMF, and MAGIC
    • HocomocoV11_motifProbs_matrix: Matrix of regulation probabilities from MotifFinder using fixed motifs from HocomocoV11 database
    • motifFinder_denovo: list of results of MotifFinder denovo motifs from promoters regions of genes from each component
    • motifFinder_denovo_fixed: list of results of MotifFinder using 125 denovo motifs with fixed motif on all genes.
    • tomtom_matched_motifs: data.table of TOMTOM matches of denovo motifs, plus metadata
  17. Datasets for evaluating SCEMENT: Scalable and Memory Efficient Integration...

    • zenodo.org
    zip
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sriram P Chockalingam; Sriram P Chockalingam; Maneesha Aluru; Maneesha Aluru; Srinivas Aluru; Srinivas Aluru (2024). Datasets for evaluating SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data [Dataset]. http://doi.org/10.5281/zenodo.11521688
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sriram P Chockalingam; Sriram P Chockalingam; Maneesha Aluru; Maneesha Aluru; Srinivas Aluru; Srinivas Aluru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource contains pre-processed A. thaliana root , the H. sapiens aortic valve datasets, PBMC Covid atlas and public 10x datasetse used in the paper, SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data. The raw datasets provided in the links below are pre-processed for quality control with respect to both cells and genes.

    A. thaliana datasets are sourced from the following locations at Single-cell Gene expression Atlas and Gene Expression Omnibus (GEO):

    1. E-GEOD-121619 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-121619/results">https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-121619/results
    2. E-GEOD-152766 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-152766/results
    3. E-GEOD-158761 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-158761/results
    4. E-GEOD-123013 : https://www.ebi.ac.uk/gxa/sc/experiments/E-GEOD-123013/results

    H. sapiens datasets are obtained from the NCBI database : https://www.ncbi.nlm.nih.gov/bioproject/PRJNA562645/

    1. GSE152766: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152766
    2. GSE158761: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158761

    All COVID atlas datasets are from: http://covid19.cancer-pku.cn . covid_atlas_data1.zip contains the h5ad files and covid_atlas_data2.zip contains the Seurat rds files.

    PBMC datasets are from the following public sources:

    Dataset NameChemistry VersionWeb Link
    10k Human PBMCs, 3' v3.1, Chromium Xv3.1https://www.10xgenomics.com/datasets/10k-human-pbmcs-3-ht-v3-1-chromium-x-3-1-high
    20k Human PBMCs, 3' HT v3.1, Chromium Xv3.1https://www.10xgenomics.com/datasets/20-k-human-pbm-cs-3-ht-v-3-1-chromium-x-3-1-high-6-1-0
    10k Human PBMCs, 3' v3.1, Chromium Controllerv3.1https://www.10xgenomics.com/datasets/10k-human-pbmcs-3-v3-1-chromium-controller-3-1-high
    Healthy PBMC Chromium Connect (channel 1)v3.1https://www.10xgenomics.com/datasets/peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-chromium-connect-channel-1-3-1-standard-3-1-0
    Healthy PBMC Chromium Connect (channel 5)v3.1https://www.10xgenomics.com/datasets/peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-chromium-connect-channel-5-3-1-standard-3-1-0
    10k PBMCs from a Healthy Donor (v3 chemistry)v3.0https://www.10xgenomics.com/datasets/10-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
    1k PBMCs from a Healthy Donor (v2 chemistry)v2.0https://www.10xgenomics.com/datasets/1-k-pbm-cs-from-a-healthy-donor-v-2-chemistry-3-standard-3-0-0
    1k PBMCs from a Healthy Donor (v3 chemistry)v3.0https://www.10xgenomics.com/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
    Fresh 68k PBMCs (Donor A)v1.0https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0
    Frozen PBMCs (Donor A)v1.0https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-a-1-standard-1-1-0
    Frozen PBMCs (Donor B)v1.0https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-b-1-standard-1-1-0
    Frozen PBMCs (Donor C)v1.0https://www.10xgenomics.com/datasets/frozen-pbm-cs-donor-c-1-standard-1-1-0
    PBMCs from a Healthy Donor: Whole Transcriptome Analysisv3.1https://www.10xgenomics.com/datasets/pbm-cs-from-a-healthy-donor-whole-transcriptome-analysis-3-1-standard-4-0-0
    PBMC 600Kv1https://www.ebi.ac.uk/gxa/sc/experiments/E-HCAD-4/downloads
    GSM4560071v2.0https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560071
    GSM4560074v2.0https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560074
    GSM4560070v2.0https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4560070

    References for the Datasets :

    1. H. sapiens dataset: Kang Xu, Shangbo Xie,Yuming Huang,Tingwen Zhou, Ming Liu, Peng Zhu, Chunli Wang, Jiawei Shi, Fei Li,Frank W. Sellke and Nianguo Dong (2020) Cell-Type Transcriptome Atlas of Human Aortic Valves Reveal Cell Heterogeneity and Endothelial to Mesenchymal Transition Involved in Calcific Aortic Valve Disease.
    2. E-GEOD-152766: Shahan R, Hsu C, Nolan TM, Cole BJ, Taylor IW et al. (2020) A single cell Arabidopsisroot atlas reveals developmental trajectories in wild type and cell identity mutants.
    3. E-GEOD-121619: Jean-Baptiste K, McFaline-Figueroa JL, Alexandre CM, Dorrity MW, Saunders L et al. (2019) Dynamics of Gene Expression in Single Root Cells of Arabidopsis thaliana.
    4. E-GEOD-123013: Ryu KH, Huang L, Kang HM, Schiefelbein J. (2019) Single-Cell RNA Sequencing Resolves Molecular Relationships Among Individual Plant Cells.
    5. E-GEOD-158761: Gala HP, Lanctot A, Jean-Baptiste K, Guiziou S, Chu JC et al. (2020) A single cell view of the transcriptome during lateral root initiation in Arabidopsis thaliana.
    6. COVID Atlas Reference: Xianwen Ren, Wen Wen, Xiaoying Fan et.al. (2021) COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas
    7. PBMC data are downloaded from respective links
  18. o

    Single cell study of neural stem cells derived from human iPSCs reveal...

    • omicsdi.org
    xml
    Updated Sep 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matti Lam (2022). Single cell study of neural stem cells derived from human iPSCs reveal distinct progenitor populations with neurogenic and gliogenic potential (2) [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-MTAB-8379
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 30, 2022
    Authors
    Matti Lam
    Variables measured
    Transcriptomics
    Description

    Single cell RNA-seq study of induced pluripotent stem cell derived neural stem cells. Analysis of gene expression over cell clusters identified inherent presence of neurogenic progenitors and gliogenic progenitors in established neural stem cells. This study aids to explain heterogeneity of neural stem cell identity and resolves gene expression enrichment in subpopulations of diverse progenitors. Processed and quality controlled data sets used for generating figure 2 in published article. Single cell raw data files for experiments are not available for public download.

  19. Norman et al., 2019 (Science) labeled Perturb-seq data

    • figshare.com
    hdf
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethan Weinberger (2023). Norman et al., 2019 (Science) labeled Perturb-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.24688110.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ethan Weinberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AnnData object containing single-cell gene expression levels of from a large-scale Perturb-seq CRISPR experiment from "Exploring genetic interaction manifolds constructed from rich single-cell phenotypes". Cells were labeled according to perturbation categories provided by the original authors. Data preprocessed as decribed in "Isolating salient variations of interest in single-cell data with contrastiveVI" (Nature Methods 2023).

  20. d

    Stemformatics

    • dknet.org
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Stemformatics [Dataset]. http://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
    Explore at:
    Dataset updated
    Apr 11, 2025
    Description

    Gene expression data portal developed for stem cell community, containing public gene expression datasets derived from microarray, RNA sequencing and single cell profiling technologies. Portal to visualize and download curated stem cell data. Provides easy to use and intuitive tools for biologists to visually explore data, including interactive gene expression profiles, principal component analysis plots and hierarchical clusters, among others.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zhengtao Xiao (2019). scRNA-seq Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.7174922.v2
Organization logo

Data from: scRNA-seq Datasets

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Apr 9, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Zhengtao Xiao
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

"*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.

Search
Clear search
Close search
Google apps
Main menu