100+ datasets found

Data, R code and output Seurat Objects for single cell RNA-seq analysis of...
figshare.com
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17058077.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Yunshun Chen; Gordon Smyth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
n
Transcription start site analysis for heterogenous CD4+ T cells using 5′...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
zip
Updated Apr 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akiko Oguchi; Yasuhiro Murakawa (2024). Transcription start site analysis for heterogenous CD4+ T cells using 5′ scRNA-seq [Dataset]. http://doi.org/10.5061/dryad.gtht76hv9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gtht76hv9
Dataset updated
Apr 22, 2024
Dataset provided by
RIKEN Center for Integrative Medical Sciences
Authors
Akiko Oguchi; Yasuhiro Murakawa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
These datasets are generated by ReapTEC (read-level pre-filtering and transcribed enhancer call) using 5' single-cell RNA-seq data on human heterogenous CD4+ T cells. By taking advantage of a unique “cap signature” derived from the 5′-end of a transcript, ReapTEC simultaneously profiles gene expression and enhancer activity at nucleotide resolution using 5′-end single-cell RNA-sequencing (5′ scRNA-seq). The detail of ReapTEC pipeline is described in https://github.com/MurakawaLab/ReapTEC.
Seurat object with cell type annotation and UMAP coordinates for zebrafish...
figshare.com
application/gzip
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gangcai Xie (2024). Seurat object with cell type annotation and UMAP coordinates for zebrafish testis single cell RNA sequencing datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27922725.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27922725.v1
Dataset updated
Nov 28, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Gangcai Xie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.
Z
Processed, annotated, seurat object
data.niaid.nih.gov
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cenk Celik; Guillaume Thibault (2023). Processed, annotated, seurat object [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7608211
Explore at:
Dataset updated
Nov 16, 2023
Dataset provided by
Nanyang Technological University
Authors
Cenk Celik; Guillaume Thibault
Description
The dataset contains an integrated, annotated Seurat v4 object. One can load the dataset into the R environment using the code below:

seurat_obj <- readRDS('PATH/TO/DOWNLOAD/seurat.rds')

The object has three assays: (I) RNA, (II) SCT and (III) integrated.
GSE264573 single-cell seurat RDS
figshare.com
bin
Updated Oct 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chao Cheng (2025). GSE264573 single-cell seurat RDS [Dataset]. http://doi.org/10.6084/m9.figshare.30370420.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30370420.v1
Dataset updated
Oct 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Chao Cheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GSE264573 single-cell seurat RDS
Single cell T cell atlas
zenodo.org
bin, csv
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerry A Mullan; Kerry A Mullan (2024). Single cell T cell atlas [Dataset]. http://doi.org/10.5281/zenodo.12569981
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12569981
Dataset updated
Jul 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kerry A Mullan; Kerry A Mullan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The attached datasets comprised of the merging of 12 high quality single cell T cell based dataset that had both the TCR-seq and GEx. The object contains ~500K paired TCR-seq with GEx in the Seurat Object (supercluster_added_ID-240531.rds). We also included the original identifiers in the Sup_Update_labels.csv a. See our https://stegor.readthedocs.io/en/latest/ for how we processed the 12 datasets and decided on the current 47 T cell annotation models using scGate.

This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.

The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.
Mm1 tumor single cell RNA-seq data
figshare.com
application/gzip
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam Kleeman (2022). Mm1 tumor single cell RNA-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.20063402.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20063402.v1
Dataset updated
Jun 13, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Sam Kleeman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Seurat matrix referring to scRNA-seq of Mm1 mouse tumors in CyC manuscript
n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+2more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Brain Transcriptome Single-cell (BTS) Atlas: Anndata, Seurat Object,...
zenodo.org
bin, zip
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seoyeon Kim; Seoyeon Kim; Jihae Lee; Jihae Lee (2024). Brain Transcriptome Single-cell (BTS) Atlas: Anndata, Seurat Object, CellTypist model, and Disorder Risk Geneplot [Dataset]. http://doi.org/10.5281/zenodo.14177002
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14177002
Dataset updated
Nov 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Seoyeon Kim; Seoyeon Kim; Jihae Lee; Jihae Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Brain Transcriptome Single-cell Atlas (BTS) Anndata, Seurat object, and Celltypist model for further use of the atlas. The Celltypist model can be utilized to accurately annotate cell types in new datasets based on the atlas. Plots illustrating the expression profile for 3,380 neurological disorder risk genes across the atlas are also uploaded. Further availability for the data can be requested by the corresponding author.

This dataset is published in Kim, S., Lee, J., Koh, I.G. et al. An integrative single-cell atlas for exploring the cellular and temporal specificity of genes related to neurological disorders during human brain development. Exp Mol Med 56, 2271–2282 (2024). https://doi.org/10.1038/s12276-024-01328-6
u
Dawnn benchmarking dataset: Simulated linear trajectories processing and...
rdr.ucl.ac.uk
application/gzip
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Simulated linear trajectories processing and label simulation [Dataset]. http://doi.org/10.5522/04/22616611.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22616611.v1
Dataset updated
May 4, 2023
Dataset provided by
University College London
Authors
George Hall; Sergi Castellano Hereza
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated linear trajectories.

FILES: Data processing code

adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form linear trajectories . generate_test_data_linear_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_linear_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_linear_traj.csv.

Resulting datasets

cells_sim_linear_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_linear_traj_sim_milo_paper.R. benchmark_dataset_sim_linear_traj.csv Cell labels generated by generate_test_data_linear_traj_sim_milo_paper.R.
scRNA-seq + scATAC-seq Challenge at NeurIPS 2021
kaggle.com
zip
Updated Sep 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq + scATAC-seq Challenge at NeurIPS 2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021
Explore at:
zip(2917180928 bytes)Available download formats
Dataset updated
Sep 16, 2022
Authors
Alexander Chervov
Description
Context

Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq

Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

(For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)

Particular data

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
scRNA-seq +CRISPR=Perturb-seq.Norman.SelectedPart
kaggle.com
zip
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq +CRISPR=Perturb-seq.Norman.SelectedPart [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-crisprperturbseqnormanselectedpart
Explore at:
zip(158260526 bytes)Available download formats
Dataset updated
Jul 20, 2022
Authors
Alexander Chervov
Description
Remark 0: See https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev For the cell cycle analysis

Remark 1:

Full dataset in https://www.kaggle.com/datasets/alexandervc/scrnaseq-crisprperturbseq-normanweissman But it is huge, and loading crashes memory, so here are cropped pieces to start with.

Remark 2:

dataset used in: "GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations" Yusuf Roohani, Kexin Huang, Jure Leskovec https://www.biorxiv.org/content/10.1101/2022.07.12.499735v1 https://twitter.com/yusufroohani/status/1547965695744360448 Accepted in ICLR

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

Particular data - Perturb-seq: Single-cell, pooled CRISPR screening experiment comparing the transcriptional effects of overexpressing genes alone or in combination

Paper: Norman TM, Horlbeck MA, Replogle JM, Ge AY et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 2019 Aug 23;365(6455):786-793. PMID: 31395745 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746554/

Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE133344

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Bassez et al. (2021) Breast Cancer processed dataset
figshare.com
application/gzip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josep Garnica (2023). Bassez et al. (2021) Breast Cancer processed dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24867018.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24867018.v3
Dataset updated
Dec 20, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Josep Garnica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed Seurat objects in .rds format with single-cell dataset obtained from Bassez et al. (2021) Nature Medicine (https://www.nature.com/articles/s41591-021-01323-8).BassezA_2021_33958794_downsampled: Seurat object including all samples (42) with random downsampling to max 1000 cells per sample, without any further filtering.BassezA_2021_33958794_3patients: Seurat object with 3 patient samples from the original data set: "BIOKEY_11", "BIOKEY_30", and "BIOKEY_4" from "Pre" condition, without downsampling or further per sample filtering.
l
cellCounts
opal.latrobe.edu.au
researchdata.edu.au
bin
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/21588276.v3
Dataset updated
Dec 19, 2022
Dataset provided by
La Trobe
Authors
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
Seurat objects associated with the tonsil cell atlas
zenodo.org
bin
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramon Massoni-Badosa; Ramon Massoni-Badosa (2023). Seurat objects associated with the tonsil cell atlas [Dataset]. http://doi.org/10.5281/zenodo.6340174
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6340174
Dataset updated
Sep 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ramon Massoni-Badosa; Ramon Massoni-Badosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the context of the Human Cell Atlas, we have created a single-cell taxonomy of cell types and states in human tonsils. This repository contains the Seurat objects derived from this effort. In particular, we have datasets for each modality (scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics), as well as cell type-specific datasets. Most importantly, this is the input that we used to create the HCATonsilData package, which allows programmatic access to all this datasets within R.
o
Test Data for Galaxy tutorial "Batch Correction and Integration" - Seurat...
ordo.open.ac.uk
bin
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marisa Loach (2025). Test Data for Galaxy tutorial "Batch Correction and Integration" - Seurat version [Dataset]. http://doi.org/10.5281/zenodo.14713816
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14713816
Dataset updated
Apr 28, 2025
Dataset provided by
The Open University
Authors
Marisa Loach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data is used for the Seurat version of the batch correction and integration tutorial on the Galaxy Training Network. The input data was provided by Seurat in the 'Integrative Analysis in Seurat v5' tutorial. The input dataset provided here has been filtered to include only cells for which nFeature_RNA > 1000. The other datasets were produced on Galaxy. The original dataset was published as: Ding, J., Adiconis, X., Simmons, S.K. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 38, 737–746 (2020). https://doi.org/10.1038/s41587-020-0465-8.
Data from: Harnessing single cell RNA sequencing to identify dendritic cell...
zenodo.org
data.niaid.nih.gov
+1more
tar
Updated Dec 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ammar Sabir Cheema; Kaibo Duan; Marc Dalod; Thien-Phong Vu Manh; Ammar Sabir Cheema; Kaibo Duan; Marc Dalod; Thien-Phong Vu Manh (2022). Harnessing single cell RNA sequencing to identify dendritic cell types, characterize their biological states and infer their activation trajectory [Dataset]. http://doi.org/10.5281/zenodo.5385611
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5385611
Dataset updated
Dec 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ammar Sabir Cheema; Kaibo Duan; Marc Dalod; Thien-Phong Vu Manh; Ammar Sabir Cheema; Kaibo Duan; Marc Dalod; Thien-Phong Vu Manh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary: Dendritic cells (DCs) orchestrate innate and adaptive immunity, by translating the sensing of distinct danger signals into the induction of different effector lymphocyte responses, to induce different defense mechanisms suited to face distinct types of threats. Hence, DCs are very plastic, which results from two key characteristics. First, DCs encompass distinct cell types specialized in different functions. Second, each DC type can undergo different activation states, fine-tuning its functions depending on its tissue microenvironment and the pathophysiological context, by adapting the output signals it delivers to the input signals it receives. Hence, to better understand DC biology and harness it in the clinic, we must determine which combinations of DC types and activation states mediate which functions, and how.
To decipher the nature, functions and regulation of DC types and their physiological activation states, one of the methods that can be harnessed most successfully is ex vivo single cell RNA sequencing (scRNAseq). However, for new users of this approach, determining which analytics strategy and computational tools to choose can be quite challenging, considering the rapid evolution and broad burgeoning of the field. In addition, awareness must be raised on the need for specific, robust and tractable strategies to annotate cells for cell type identity and activation states. It is also important to emphasize the necessity of examining whether similar cell activation trajectories are inferred by using different, complementary methods. In this chapter, we take these issues into account for providing a pipeline for scRNAseq analysis and illustrating it with a tutorial reanalyzing a public dataset of mononuclear phagocytes isolated from the lungs of naïve or tumor-bearing mice. We describe this pipeline step-by-step, including data quality controls, dimensionality reduction, cell clustering, cell cluster annotation, inference of the cell activation trajectories and investigation of the underpinning molecular regulation. It is accompanied with a more complete tutorial on Github. We anticipate that this method will be helpful for both wet lab and bioinformatics researchers interested in harnessing scRNAseq data for deciphering the biology of DCs or other cell types, and that it will contribute to establishing high standards in the field.

Data:

MDAlab_cDC1_maturation.tar : Docker image used for the analysis
CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021
kaggle.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2023). CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021
Explore at:
zip(646191284 bytes)Available download formats
Dataset updated
Jan 22, 2023
Authors
Alexander Chervov
Description
Context

Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

CITE-seq - joint single cell RNA sequencing + single cell measurements of CD** proteins. (https://en.wikipedia.org/wiki/CITE-Seq) (For companion dataset on scRNA-seq + scATAC-seq, see: https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021 )

Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

Particular data

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
E
Single-cell transcriptomics uncovers zonation of function in the mesenchyme...
dtechtive.com
find.data.gov.scot
txt
Updated Feb 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh Centre for Inflammation Research (2020). Single-cell transcriptomics uncovers zonation of function in the mesenchyme during liver fibrosis - Seurat objects [Dataset]. http://doi.org/10.7488/ds/2769
Explore at:
txt(0.0166 MB), txt(0.0013 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2769
Dataset updated
Feb 12, 2020
Dataset provided by
University of Edinburgh Centre for Inflammation Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
We profile the transcriptomes of ~30,000 mouse single cells to deconvolve the hepatic mesenchyme in healthy and fibrotic liver at high resolution. We reveal spatial zonation of hepatic stellate cells across the liver lobule, designated portal vein-associated HSC and central vein-associated HSC, and uncover an equivalent functional zonation in a mouse model of centrilobular fibrosis. Our work illustrates the power of single-cell transcriptomics to resolve key collagen-producing cells driving liver fibrosis with high precision. We provide the contents of these data as Seurat R objects.
m
Seurat objects for multiome analysis of neuroblastoma cell lines - 4/4
data.mendeley.com
Updated Jul 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Guyer (2024). Seurat objects for multiome analysis of neuroblastoma cell lines - 4/4 [Dataset]. http://doi.org/10.17632/cp4d7t74vb.1
Explore at:
Unique identifier
https://doi.org/10.17632/cp4d7t74vb.1
Dataset updated
Jul 25, 2024
Authors
Richard Guyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RDS files containing processed Seurat objects for multiome analysis of neuroblastoma cell lines. File names reflect the cell line.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1

Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.17058077.v1

Dataset updated

May 31, 2023

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Yunshun Chen; Gordon Smyth

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.

Clear search

Close search

Google apps

Main menu

Data, R code and output Seurat Objects for single cell RNA-seq analysis of...

Transcription start site analysis for heterogenous CD4+ T cells using 5′...

Seurat object with cell type annotation and UMAP coordinates for zebrafish...

Processed, annotated, seurat object

GSE264573 single-cell seurat RDS

Single cell T cell atlas

Mm1 tumor single cell RNA-seq data

Data from: Large-scale integration of single-cell transcriptomic data...

Brain Transcriptome Single-cell (BTS) Atlas: Anndata, Seurat Object,...

Dawnn benchmarking dataset: Simulated linear trajectories processing and...

scRNA-seq + scATAC-seq Challenge at NeurIPS 2021

Context

Particular data

Related datasets:

Inspiration

scRNA-seq +CRISPR=Perturb-seq.Norman.SelectedPart

Remark 1:

Remark 2:

Data and Context

Related datasets:

Inspiration

Bassez et al. (2021) Breast Cancer processed dataset

cellCounts

Seurat objects associated with the tonsil cell atlas

Test Data for Galaxy tutorial "Batch Correction and Integration" - Seurat...

Data from: Harnessing single cell RNA sequencing to identify dendritic cell...

CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021

Context

Particular data

Related datasets:

Inspiration

Single-cell transcriptomics uncovers zonation of function in the mesenchyme...

Seurat objects for multiome analysis of neuroblastoma cell lines - 4/4

Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues