100+ datasets found

Z
Processed, annotated, seurat object
data.niaid.nih.gov
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cenk Celik; Guillaume Thibault (2023). Processed, annotated, seurat object [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7608211
Explore at:
Dataset updated
Nov 16, 2023
Dataset provided by
Nanyang Technological University
Authors
Cenk Celik; Guillaume Thibault
Description
The dataset contains an integrated, annotated Seurat v4 object. One can load the dataset into the R environment using the code below:

seurat_obj <- readRDS('PATH/TO/DOWNLOAD/seurat.rds')

The object has three assays: (I) RNA, (II) SCT and (III) integrated.
Processed naive T cell single-cell RNA-seq, Seurat object
figshare.com
application/gzip
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Bunis (2021). Processed naive T cell single-cell RNA-seq, Seurat object [Dataset]. http://doi.org/10.6084/m9.figshare.11886891.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11886891.v2
Dataset updated
Jan 5, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Daniel Bunis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed naive CD4 and CD8 T cell single-cell RNAseq data from human samples. The file contains a Seurat object stored as an .rds file which can be read into R with the readRDS() function. It was generated using the raw data of similar name in this project, as well as the code stored here: https://github.com/dtm2451/ProgressiveHematopoiesis
Processed HSPCs single-cell RNA-seq, Seurat object
figshare.com
application/gzip
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Bunis (2021). Processed HSPCs single-cell RNA-seq, Seurat object [Dataset]. http://doi.org/10.6084/m9.figshare.11894691.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11894691.v2
Dataset updated
Jan 5, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Daniel Bunis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed hematopoietic stem and progenitor cell (HSPC) single-cell RNAseq data from human samples. The file contains a Seurat object stored as an .rds file which can be read into R with the readRDS() function. It was generated using the raw data of similar name in this project, as well as the code stored here: https://github.com/dtm2451/ProgressiveHematopoiesis
u
Dawnn benchmarking dataset: Simulated linear trajectories processing and...
rdr.ucl.ac.uk
application/gzip
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Simulated linear trajectories processing and label simulation [Dataset]. http://doi.org/10.5522/04/22616611.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22616611.v1
Dataset updated
May 4, 2023
Dataset provided by
University College London
Authors
George Hall; Sergi Castellano Hereza
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated linear trajectories.

FILES: Data processing code

adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form linear trajectories . generate_test_data_linear_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_linear_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_linear_traj.csv.

Resulting datasets

cells_sim_linear_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_linear_traj_sim_milo_paper.R. benchmark_dataset_sim_linear_traj.csv Cell labels generated by generate_test_data_linear_traj_sim_milo_paper.R.
Human fetal retina FL scRNA-seq processed Seurat object
zenodo.org
bin
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominic WH Shayler; Kevin Stachelek; David Cobrinik; Dominic WH Shayler; Kevin Stachelek; David Cobrinik (2025). Human fetal retina FL scRNA-seq processed Seurat object [Dataset]. http://doi.org/10.5281/zenodo.15231490
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15231490
Dataset updated
Apr 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dominic WH Shayler; Kevin Stachelek; David Cobrinik; Dominic WH Shayler; Kevin Stachelek; David Cobrinik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 16, 2025
Description
This processed Seurat object represents full-length single-cell RNA-sequencing data derived from human fetal retina. This dataset is associated with the eLife publication titled "Identification and characterization of early human photoreceptor states and cell-state-specific retinoblastoma-related features" (https://doi.org/10.7554/eLife.101918.1)
Processed seurat object for CJRB-101 study
figshare.com
Updated Dec 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyunkyung Park (2025). Processed seurat object for CJRB-101 study [Dataset]. http://doi.org/10.6084/m9.figshare.30969202.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.30969202.v1
Dataset updated
Dec 30, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Hyunkyung Park
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This RDS file contains the processed Seurat object used for macrophage repolarization T cell analyses in the CJRB-101 study.
Single cell T cell atlas
zenodo.org
bin, csv
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerry A Mullan; Kerry A Mullan (2024). Single cell T cell atlas [Dataset]. http://doi.org/10.5281/zenodo.12569981
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12569981
Dataset updated
Jul 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kerry A Mullan; Kerry A Mullan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The attached datasets comprised of the merging of 12 high quality single cell T cell based dataset that had both the TCR-seq and GEx. The object contains ~500K paired TCR-seq with GEx in the Seurat Object (supercluster_added_ID-240531.rds). We also included the original identifiers in the Sup_Update_labels.csv a. See our https://stegor.readthedocs.io/en/latest/ for how we processed the 12 datasets and decided on the current 47 T cell annotation models using scGate.

This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.

The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.
n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Seurat Objects for CD27 Agonism Enhances Long-Lived CD4 T Cell Vaccine...
zenodo.org
bin
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Hartman; Zachary Hartman (2025). Seurat Objects for CD27 Agonism Enhances Long-Lived CD4 T Cell Vaccine Responses Critical for Anti-Tumor Immunity [Dataset]. http://doi.org/10.5281/zenodo.17592233
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17592233
Dataset updated
Nov 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zachary Hartman; Zachary Hartman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 12, 2025
Description
Raw and processed Seurat Data objects for scRNA-seq analysis presented in CD27 Agonism Enhances Long-Lived CD4 T Cell Vaccine Responses Critical for Anti-Tumor Immunity

Objects

combined_seurat_benchmark_with_mito.rds - Raw Seurat object

aCD27_final_pure_seurat.rds - Processed Seurat object
d
Transcription start site analysis for heterogenous CD4+ T cells using 5â€²...
search.dataone.org
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akiko Oguchi; Yasuhiro Murakawa (2025). Transcription start site analysis for heterogenous CD4+ T cells using 5â€² scRNA-seq [Dataset]. http://doi.org/10.5061/dryad.gtht76hv9
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.gtht76hv9
Dataset updated
Jul 30, 2025
Dataset provided by
Dryad Digital Repository
Authors
Akiko Oguchi; Yasuhiro Murakawa
Description
These datasets are generated by ReapTEC (read-level pre-filtering and transcribed enhancer call) using 5' single-cell RNA-seq data on human heterogenous CD4+ T cells. By taking advantage of a unique â€œcap signatureâ€ derived from the 5â€²-end of a transcript, ReapTEC simultaneously profiles gene expression and enhancer activity at nucleotide resolution using 5â€²-end single-cell RNA-sequencing (5â€² scRNA-seq). The detail of ReapTEC pipeline is described in https://github.com/MurakawaLab/ReapTEC., , , README: Transcription start site analysis for heterogenous CD4+ T cells using 5â€² scRNA-seq

https://doi.org/10.5061/dryad.gtht76hv9

Description of the data and file structure

Data_summary.xlsx.zip: Summary of single-cell experiments in this study.

5scCTSSbed_All.zip: There are 102 files containing count data for analyzing transcription start site (TSS) signals. Details are as follows.

Our original raw sequencing data and processed data of 5â€² scRNA-seq have been deposited to National Bioscience Database Center (NBDC) Human Database (accession code: hum0350). Raw sequencing data originated from human subjects have been deposited to Japanese Genotype-phenotype Archive (JGA, accession code: JGAS000689). We retrieved 5â€² scRNA-seq data for human memory CD4+ T cells stimulated with viral antigens from the Gene Expression Omnibus database (accession number GSE152522). In total, 102 5â€² scRNA-seq datasets were processed by ReapTEC pipeline (https://github.com/MurakawaLab/ReapTEC)....
m
Seurat objects for multiome analysis of neuroblastoma cell lines - 4/4
data.mendeley.com
Updated Jul 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Guyer (2024). Seurat objects for multiome analysis of neuroblastoma cell lines - 4/4 [Dataset]. http://doi.org/10.17632/cp4d7t74vb.1
Explore at:
Unique identifier
https://doi.org/10.17632/cp4d7t74vb.1
Dataset updated
Jul 25, 2024
Authors
Richard Guyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RDS files containing processed Seurat objects for multiome analysis of neuroblastoma cell lines. File names reflect the cell line.
f
HuPSA and MoPSA raw data in Seurat V5 format
datasetcatalog.nlm.nih.gov
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheng, Siyuan (2024). HuPSA and MoPSA raw data in Seurat V5 format [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001378286
Explore at:
Dataset updated
Dec 9, 2024
Authors
Cheng, Siyuan
Description
These are the raw data for HuPSA and MoPSA scRNAseq datasets. Both RDS files can be loaded into R and processed through the Seurat package.https://doi.org/10.1038/s41698-024-00667-x
H
scRNA-seq_huang2019
dataverse.harvard.edu
Updated Aug 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kee Wui Huang (2019). scRNA-seq_huang2019 [Dataset]. http://doi.org/10.7910/DVN/QB5CC8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QB5CC8
Dataset updated
Aug 21, 2019
Dataset provided by
Harvard Dataverse
Authors
Kee Wui Huang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Serialized R data files (.rds) associated with the inDrop single-cell RNA-seq analysis in Huang et al., 2019. Each file has a single Seurat object containing a subset of clusters from the full processed dataset, which were separated into different objects due to file size limitations. Raw data (UMIFM counts) are included in the corresponding slot in each Seurat object. Seurat objects can be re-merged into a single object containing the full dataset using the MergeSeurat function.
Annotated Seurat object of RFC1-KO zebrafish brains at 2dpf and 4dpf
zenodo.org
bin
Updated Sep 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastien Audet; Sebastien Audet; Fanny Nobilleau; Fanny Nobilleau; Martine Tetreault; Martine Tetreault; ERIC SAMARUT; ERIC SAMARUT (2025). Annotated Seurat object of RFC1-KO zebrafish brains at 2dpf and 4dpf [Dataset]. http://doi.org/10.5281/zenodo.15499729
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15499729
Dataset updated
Sep 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastien Audet; Sebastien Audet; Fanny Nobilleau; Fanny Nobilleau; Martine Tetreault; Martine Tetreault; ERIC SAMARUT; ERIC SAMARUT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated Seurat objects (RDS) of single-cell data from RFC1-KO zebrafish brains at 2 days and 4 days post-fertilization. These data are published in complement to the publication "RFC1 regulates the expansion of neural progenitors in the developing zebrafish cerebellum" (Nobileau et al.), and the raw data available in PRJNA1126282. Non-exhaustive R processing code (in Markdown) used to generate data is also made available. This enables the regeneration of most of the presented figure, as well as additional analysis. More information on the generation of these data is available in the associated publication.
Processed CITE-seq Seurat object for MPET
figshare.com
application/gzip
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rekha Mudappathi (2025). Processed CITE-seq Seurat object for MPET [Dataset]. http://doi.org/10.6084/m9.figshare.30434290.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30434290.v1
Dataset updated
Oct 24, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rekha Mudappathi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes normalized RNA and ADT expression matrices, metadata (sample, group, donor information), and precomputed features used in the MPET (Modeling Protein Expression and Transport) framework.
Bassez et al. (2021) Breast Cancer processed dataset
figshare.com
application/gzip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josep Garnica (2023). Bassez et al. (2021) Breast Cancer processed dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24867018.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24867018.v3
Dataset updated
Dec 20, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Josep Garnica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed Seurat objects in .rds format with single-cell dataset obtained from Bassez et al. (2021) Nature Medicine (https://www.nature.com/articles/s41591-021-01323-8).BassezA_2021_33958794_downsampled: Seurat object including all samples (42) with random downsampling to max 1000 cells per sample, without any further filtering.BassezA_2021_33958794_3patients: Seurat object with 3 patient samples from the original data set: "BIOKEY_11", "BIOKEY_30", and "BIOKEY_4" from "Pre" condition, without downsampling or further per sample filtering.
snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort)
figshare.com
bin
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Mikolajewicz (2024). snRNA-seq, Primary-Recurrent GBM (Mikolajewicz Cohort) [Dataset]. http://doi.org/10.6084/m9.figshare.25917628.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25917628.v1
Dataset updated
Jun 4, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nicholas Mikolajewicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
CPA-Perturb-seq: Multiplexed single-cell characterization of alternative...
zenodo.org
application/gzip, bin
Updated Feb 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madeline H Kowalski; Madeline H Kowalski; Hans-Hermann Wessels; Hans-Hermann Wessels; Johannes Linder; Johannes Linder; Saket Choudhary; Saket Choudhary; Austin Hartman; Austin Hartman; Yuhan Hao; Yuhan Hao; Isabella Mascio; Isabella Mascio; Carol Dalgarno; Carol Dalgarno; Anshul Kundaje; Anshul Kundaje; Rahul Satija; Rahul Satija (2023). CPA-Perturb-seq: Multiplexed single-cell characterization of alternative polyadenylation regulators (Perturb-seq data) [Dataset]. http://doi.org/10.5281/zenodo.7619593
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7619593
Dataset updated
Feb 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Madeline H Kowalski; Madeline H Kowalski; Hans-Hermann Wessels; Hans-Hermann Wessels; Johannes Linder; Johannes Linder; Saket Choudhary; Saket Choudhary; Austin Hartman; Austin Hartman; Yuhan Hao; Yuhan Hao; Isabella Mascio; Isabella Mascio; Carol Dalgarno; Carol Dalgarno; Anshul Kundaje; Anshul Kundaje; Rahul Satija; Rahul Satija
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This site provides access to datasets from the CPA-Perturb-seq manuscript Kowalski*, Wessels*, Linder* et al., including processed Perturb-seq datasets from HEK293FT and K562. We release these data as Seurat objects, where each object contains single-cell quantifications of gene expression (RNA assay), and in addition, quantifications of polyA site usage (polyA site assay). To explore these data, please install the PASTA (PolyA Site analysis using relative Transcript Abundance) package, which provides infrastructure and analytical tools to explore alternative polyadenylation at single-cell resolution. For each dataset, we also include a fragment file which enables visualization of read coverage plots across groups of cells.

The files include:

1. CPA_HEK293FT.Rds: Seurat object containing the HEK293 CPA-Perturb-seq dataset

2. CPA_HEK293FT_fragments.tsv.gz : Fragment file for the HEK293 dataset

3. CPA_HEK293FT_fragments.tsv.gz.tbi : Fragment file index for the HEK293 dataset

4. CPA_K562.Rds : Seurat object containing the K562 CPA-Perturb-seq dataset

5. CPA_K562_fragments.tsv.gz : Fragment file for the K562 dataset

6. CPA_K562_fragments.tsv.gz.tbi : Fragment file index for the K562 dataset

R code below:

library(PASTA) hek <- readRDS("CPA_HEK293FT.Rds") # remove fragment file information Fragments(hek) <- NULL # Update the path of the fragment file Fragments(hek) <- CreateFragmentObject(path = "download/CPA_HEK293FT_fragments.tsv.gz", cells = Cells(hek)) # visualize polyA site usage PolyACoveragePlot(hek, region ="7-26212195-26213351")
Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute...
figshare.com
bin
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emin Onur Karakaslar (2025). Processed data of single cell RNA-sequencing of 16 NPM1-mutated Acute Myeloid Leukemia samples [Dataset]. http://doi.org/10.6084/m9.figshare.26189771.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26189771.v1
Dataset updated
Jun 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Emin Onur Karakaslar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:
Spatial Transcriptomics (10X Xenium) Data From Early Postnatal Lung...
zenodo.org
csv, zip
Updated Oct 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tristan FRUM; Tristan FRUM; Jason Spence; Jason Spence (2025). Spatial Transcriptomics (10X Xenium) Data From Early Postnatal Lung Specimens [Dataset]. http://doi.org/10.5281/zenodo.17155546
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17155546
Dataset updated
Oct 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tristan FRUM; Tristan FRUM; Jason Spence; Jason Spence
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinical interventions and inflammatory signaling shape the transcriptional and cellular architecture of the early postnatal lung

Spatial Transcriptomics was performed using the 10X Xenium Platform with a 480 custom-designed probe set on 1 tissue section from 5 distinct early postnatal lung specimens. CSV files contain cell type identities as determined by label transfer.

.zip files should be unzipped to the same directory and can be viewed with Xenium Explorer.

.csv files contain cell type annotations as determined by label transfer to hand annotated single nuclei RNA-sequencing data from early postnatal lung. They can be added as a custom cell group in Xenium Explorer.

Code used in analysis of this data is available at: http://github.com/jason-spence-lab/Frum-et-al.-2025a.git

METHODS
Tissue Preparation for Xenium Spatial Transcriptomics Analysis
Xenium slides were removed from -20°C storage and allowed to come to room temperature for 30 minutes and then were placed on a 42ºC slide warmed and coated with DNAse/RNAse free water (Corning, Cat# 46000CM). Small sections from multiple specimens were carefully placed within the sample placement area. Most of the water was removed when sections had completely flattened. Slides dried on the slide warmer for three hours before transport to the Advanced Genomics Core. Xenium slides were processed by the Advanced Genomics Core using the Xenium In SituGene Expression with Cell Segmentation workflow (10X, #CG000749).

Xenium Data Analysis
Preprocessing/QC Filtering
Centroids and Segmentation coordinates and Gene Expression counts were determined by Xenium Onboard Analysis v4.0 and imported into R using Seurat::ReadXenium(). Gene Expression counts were converted to a Seurat object using Seurat::CreateSeuratObject(). Coordinates for centroids and segmentations were first converted into a field of view using Seurat::CreateFOV() and then appended to the Seurat object. Segmentations with less than 25 gene expression counts were excluded from the analysis.

Label Transfer
To align low-complexity 480 probe Xenium data with higher complexity snRNA-seq data the reference data was transformed using Seurat::SCTransform() with 3000 variable features. Each specimen was processed individually, also undergoing SCTransformation using 250 variable features. Any Xenium probes expressed in over 95% of cells were excluded from analysis. Anchors between each specimen and the snRNA-seq reference were calculated using FindTransferAnchors() using the SCT assay of both datasets, 20 dimensions, k.filter = 200, and considering only the variable features from the Xenium specimen. Cell type annotations from the snRNA-seq data were then transferred to the Xenium specimen using TransferData(), with anchors weighted by the PCs of the Xenium specimen.