44 datasets found

n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
f
Processed naive T cell single-cell RNA-seq, Seurat object
figshare.com
application/gzip
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Bunis (2021). Processed naive T cell single-cell RNA-seq, Seurat object [Dataset]. http://doi.org/10.6084/m9.figshare.11886891.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11886891.v2
Dataset updated
Jan 5, 2021
Dataset provided by
figshare
Authors
Daniel Bunis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed naive CD4 and CD8 T cell single-cell RNAseq data from human samples. The file contains a Seurat object stored as an .rds file which can be read into R with the readRDS() function. It was generated using the raw data of similar name in this project, as well as the code stored here: https://github.com/dtm2451/ProgressiveHematopoiesis
pbmc single cell RNA-seq matrix
zenodo.org
csv
Updated May 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager (2021). pbmc single cell RNA-seq matrix [Dataset]. http://doi.org/10.5281/zenodo.4730807
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4730807
Dataset updated
May 4, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.

Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.

The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.

Files content:

- raw_dataset.csv: raw gene counts

- normalized_dataset.csv: normalized gene counts (single cell matrix)

- cell_types.csv: cell types identified from annotated cell clusters

- cell_types_macro.csv: cell macro types

- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
f
Processed HSPCs single-cell RNA-seq, Seurat object
figshare.com
application/gzip
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Bunis (2021). Processed HSPCs single-cell RNA-seq, Seurat object [Dataset]. http://doi.org/10.6084/m9.figshare.11894691.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11894691.v2
Dataset updated
Jan 5, 2021
Dataset provided by
figshare
Authors
Daniel Bunis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed hematopoietic stem and progenitor cell (HSPC) single-cell RNAseq data from human samples. The file contains a Seurat object stored as an .rds file which can be read into R with the readRDS() function. It was generated using the raw data of similar name in this project, as well as the code stored here: https://github.com/dtm2451/ProgressiveHematopoiesis
Bassez et al. (2021) Breast Cancer processed dataset
figshare.com
application/gzip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josep Garnica (2023). Bassez et al. (2021) Breast Cancer processed dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24867018.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24867018.v3
Dataset updated
Dec 20, 2023
Dataset provided by
figshare
Authors
Josep Garnica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Processed Seurat objects in .rds format with single-cell dataset obtained from Bassez et al. (2021) Nature Medicine (https://www.nature.com/articles/s41591-021-01323-8).BassezA_2021_33958794_downsampled: Seurat object including all samples (42) with random downsampling to max 1000 cells per sample, without any further filtering.BassezA_2021_33958794_3patients: Seurat object with 3 patient samples from the original data set: "BIOKEY_11", "BIOKEY_30", and "BIOKEY_4" from "Pre" condition, without downsampling or further per sample filtering.

Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to...

zenodo.org

bin, csv, zip

Updated Oct 24, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang (2024). Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to anti-PD-1 and anti-PD-1/CTLA-4 immunotherapy in melanoma [Dataset]. http://doi.org/10.5281/zenodo.13971562

Explore at:

bin, csv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13971562

Dataset updated

Oct 24, 2024

Dataset provided by

Zenodo

Authors

Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset details the scRNASeq and TCR-Seq analysis of sorted PD-1+ CD8+ T cells from patients with melanoma treated with checkpoint therapy (anti-PD-1 monotherapy and anti-PD-1 & anti-CTLA-4 combination therapy) at baseline and after the first cycle of therapy. A major publication using this dataset is accessible here: (reference)

*experimental design

Single-cell RNA sequencing was performed using 10x Genomics with feature barcoding technology to multiplex cell samples from different patients undergoing mono or dual therapy so that they can be loaded on one well to reduce costs and minimize technical variability. Hashtag oligomers (oligos) were obtained as purified and already oligo-conjugated in TotalSeq-C format from BioLegend. Cells were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

*extract protocol

PBMCs were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions.

*library construction protocol

Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

*library strategy

scRNA-seq and scTCR-seq

*data processing step

Pre-processing of sequencing results to generate count matrices (gene expression and HTO barcode counts) was performed using the 10x genomics Cell Ranger pipeline.

Further processing was done with Seurat (cell and gene filtering, hashtag identification, clustering, differential gene expression analysis based on gene expression).

*genome build/assembly

Alignment was performed using prebuilt Cell Ranger human reference GRCh38.

*processed data files format and content

RNA counts and HTO counts are in sparse matrix format and TCR clonotypes are in csv format.

Datasets were merged and analyzed by Seurat and the analyzed objects are in rds format.

file name	file checksum
PD1CD8_160421_filtered_feature_bc_matrix.zip	da2e006d2b39485fd8cf8701742c6d77
PD1CD8_190421_filtered_feature_bc_matrix.zip	e125fc5031899bba71e1171888d78205
PD1CD8_160421_filtered_contig_annotations.csv	927241805d507204fbe9ef7045d0ccf4
PD1CD8_190421_filtered_contig_annotations.csv	8ca544d27f06e66592b567d3ab86551e

*processed data file	antibodies/tags
PD1CD8_160421_filtered_feature_bc_matrix.zip	none
PD1CD8_160421_filtered_feature_bc_matrix.zip	TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M1_base_monotherapy TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M1_post_monotherapy TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - C1_base_combined_therapy TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - C1_post_combined_therapy TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C2_base_combined_therapy TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C2_post_combined_therapy
PD1CD8_160421_filtered_contig_annotations.csv	none
PD1CD8_190421_filtered_feature_bc_matrix.zip	none
PD1CD8_190421_filtered_feature_bc_matrix.zip	TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M2_base_monotherapy TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M2_post_monotherapy TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - M3_base_monotherapy TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - M3_post_monotherapy TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C3_base_combined_therapy TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C3_post_combined_therapy
PD1CD8_190421_filtered_contig_annotations.csv	none

A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a...
zenodo.org
data.niaid.nih.gov
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Franziska Hildebrandt; Franziska Hildebrandt; Miren Urrutia Iturritza; Miren Urrutia Iturritza; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev (2023). A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration [Dataset]. http://doi.org/10.5281/zenodo.8328679
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8328679
Dataset updated
Sep 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Franziska Hildebrandt; Franziska Hildebrandt; Miren Urrutia Iturritza; Miren Urrutia Iturritza; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev; Christian Zwicker; Bavo Vanneste; Noémi Van Hul; Elisa Semle; Tales Pascini; Sami Saarenpää; Mengxiao He; Emma R. Andersson; Charlotte L. Scott; Joel Vega-Rodriguez; Joakim Lundeberg; Johan Ankarklev
Description
Dataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"

Structure

ST_berghei_liver

contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.

count contains gene expression matrix output from stpipeline in .tsv format

spotfiles contains coordinate files for count matrices

images contains scaled H&E, Fluorescence (FL) and annotated H&E images (from FL annotations) scaled to 10% of the original image size.

masks contains image masks for hepaquery analysis

distances contains distance measurements from original section sorted by timepoint as well as combined across timepoints

cluster contains clustering information across spatial positions used in spatial enrichment analysis

STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver

visium_berghei_liver

contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.

V10S29-135_A1 contains spaceranger output for section 1 for infected and control sections at 38h post-infection

V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection

V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection

V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection

se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver

snSeq_berghei_liver

contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.

cellranger_cnt_out contains feature count matrix information from cell ranger output

final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz

raw images.zip contains raw images for supplementary figures 20-22

adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Z
Datasets accompanying scANANSE
data.niaid.nih.gov
zenodo.org
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Smits, J.G.A. (2023). Datasets accompanying scANANSE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7446266
Explore at:
Dataset updated
Mar 13, 2023
Dataset provided by
Arts, J.A.
Smits, J.G.A.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The preprocessed Seurat object and the two Scanpy objects that can be used to run the scANANSE pipeline with.

Seurat object: preprocessed_PBMC.Rds

Scanpy objects: rna_PBMC.h5ad, atac_PBMC.h5ad

Additional raw data, used to construct the preprocessed objects, supplemented from:

https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz.tbi, https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat
f
85 shared genes in DEGs related to mouse age.
plos.figshare.com
xlsx
Updated Nov 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong He; Qiang Zhang; Limei Wang; Yiwen Hu; Yue Qiu; Jia Liu; Dingyun You; Jishuai Cheng; Xue Cao (2024). 85 shared genes in DEGs related to mouse age. [Dataset]. http://doi.org/10.1371/journal.pone.0311374.s005
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311374.s005
Dataset updated
Nov 26, 2024
Dataset provided by
PLOS ONE
Authors
Rong He; Qiang Zhang; Limei Wang; Yiwen Hu; Yue Qiu; Jia Liu; Dingyun You; Jishuai Cheng; Xue Cao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveTo guide animal experiments, we investigated the similarities and differences between humans and mice in aging and Alzheimer’s disease (AD) at the single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing (scRNA-seq) level.MethodsMicroglia cells were extracted from dataset GSE198323 of human post-mortem hippocampus. The distributions and proportions of microglia subpopulation cell numbers related to AD or age were compared. This comparison was done between GSE198323 for humans and GSE127892 for mice, respectively. The Seurat R package and harmony R package were used for data analysis and batch effect correction. Differentially expressed genes (DEGs) were identified by FindMarkers function with MAST test. Comparative analyses were conducted on shared genes in DEGs associated with age and AD. The analyses were done between human and mouse using various bioinformatics techniques. The analysis of genes in DEGs related to age was conducted. Similarly, the analysis of genes in DEGs related to AD was performed. Cross-species analyses were conducted using orthologous genes. Comparative analyses of pseudotime between humans and mice were performed using Monocle2.Results(1) Similarities: The proportion of microglial subpopulation Cell_APOE/Apoe shows consistent trends, whether in AD or normal control (NC) groups in both humans and mice. The proportion of Cell_CX3CR1/Cx3cr1, representing homeostatic microglia, remains stable with age in NC groups across species. Tuberculosis and Fc gamma R-mediated phagocytosis pathways are shared in microglia responses to age and AD across species, respectively. (2) Differences: IL1RAPL1 and SPP1 as marker genes are more identifiable in human microglia compared to their mouse counterparts. Most genes of DEGs associated with age or AD exhibit different trends between humans and mice. Pseudotime analyses demonstrate varying cell density trends in microglial subpopulations, depending on age or AD across species.ConclusionsMouse Apoe and Cell_Apoe maybe serve as proxies for studying human AD, while Cx3cr1 and Cell_Cx3cr1 are suitable for human aging studies. However, AD mouse models (App_NL_G_F) have limitations in studying human genes like IL1RAPL1 and SPP1 related to AD. Thus, mouse models cannot fully replace human samples for AD and aging research.
Data from: Robust clustering and interpretation of scRNA-seq data using...
zenodo.org
data.niaid.nih.gov
bin, txt, zip
Updated May 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Schmidt; Florian Schmidt; Bobby Ranjan; Bobby Ranjan (2021). Robust clustering and interpretation of scRNA-seq data using reference component analysis [Dataset]. http://doi.org/10.5281/zenodo.4021967
Explore at:
bin, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4021967
Dataset updated
May 27, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florian Schmidt; Florian Schmidt; Bobby Ranjan; Bobby Ranjan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets and Code accompanying the new release of RCA, RCA2. The R-package for RCA2 is available at GitHub: https://github.com/prabhakarlab/RCAv2/

The datasets included here are:

Datasets required for a characterization of batch effects:

merged_rna_seurat.rds

de_list.rds

mergedRCAObj.rds

merged_rna_integrated.rds

10X_PBMCs.RDS: Processed 10X PBMC data RCA2 object (10X PBMC example data sets )

NBM_RDS_Files.zip: Several RDS files containing RCA2 object of Normal Bone Marrow (NBM) data, umap coordinates, doublet finder results and metadata information (Normal Bone Marrow use case)

Dataset used for the Covid19 example:

blish_covid.seu.rds

rownames_of_glocal_projection_immune_cells.txt

Data sets used to outline the ability of supervised clustering to detect disease states:

809653.seurat.rds

blish_covid.seu.rds

Performance benchmarking results:

Memory_consumption.txt

rca_time_list.rds

The R script provides R code to regenerate the main paper Figures 2 to 7 modulo some visual modifications performed in Inkscape.

Provided R scripts are:

ComputePairWiseDE_v2.R (Required code for pairwise DE computation)

RCA_Figure_Reproduction.R
n
Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jun 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow (2020). Single cell RNA-seq analysis reveals that prenatal arsenic exposure results in long-term, adverse effects on immune gene expression in response to Influenza A infection [Dataset]. http://doi.org/10.5061/dryad.vt4b8gtp6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vt4b8gtp6
Dataset updated
Jun 1, 2020
Dataset provided by
Dartmouth College
Dartmouth–Hitchcock Medical Center
Authors
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.

Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).

Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.

Preprocessing of single cell RNA sequencing (scRNA-seq) data

Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.

Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq

Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.

Differential gene expression by immune cells

Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.

Analysis of arsenic effect on immune cell gene expression by scRNA-seq.

Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
f
Data from: COVID-19 severity correlates with airway epithelium-immune cell...
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Lorenz Chua; Soeren Lukassen; Saskia Trump; Bianca P. Hennig; Daniel Wendisch; Fabian Pott; Olivia Debnath; Loreen Thürmann; Florian Kurth; Maria Theresa Völker; Julia Kazmierski; Bernd Timmermann; Sven Twardziok; Stefan Schneider; Felix Machleidt; Holger Müller-Redetzky; Melanie Maier; Alexander Krannich; Sein Schmidt; Felix Balzer; Johannes Liebig; Jennifer Loske; Norbert Suttorp; Jürgen Eils; Naveed Ishaque; Uwe Gerd Liebert; Christof von Kalle; Andreas Hocke; Martin Witzenrath; Christine Goffinet; Christian Drosten; Sven Laudi; Irina Lehmann; Christian Conrad; Leif-Erik Sander; Roland Eils (2023). COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis [Dataset]. http://doi.org/10.6084/m9.figshare.12436517.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12436517.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Robert Lorenz Chua; Soeren Lukassen; Saskia Trump; Bianca P. Hennig; Daniel Wendisch; Fabian Pott; Olivia Debnath; Loreen Thürmann; Florian Kurth; Maria Theresa Völker; Julia Kazmierski; Bernd Timmermann; Sven Twardziok; Stefan Schneider; Felix Machleidt; Holger Müller-Redetzky; Melanie Maier; Alexander Krannich; Sein Schmidt; Felix Balzer; Johannes Liebig; Jennifer Loske; Norbert Suttorp; Jürgen Eils; Naveed Ishaque; Uwe Gerd Liebert; Christof von Kalle; Andreas Hocke; Martin Witzenrath; Christine Goffinet; Christian Drosten; Sven Laudi; Irina Lehmann; Christian Conrad; Leif-Erik Sander; Roland Eils
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single-cell RNA-Seq of airway samples of COVID-19 patients and healthy controlsThis dataset comprises single-cell RNA-Seq data of nasopharyngeal, protected specimen brush, and bronchial lavage samples of 19 COVID-19 patients (eight moderate and eleven critical according to the WHO classification) and five healthy controls, for a total of 36 samples. An in-depth description is presented in the manuscript "Cross-talk between the airway epithelium and activated immune cells defines severity in COVID-19" (https://www.medrxiv.org/content/10.1101/2020.04.29.20084327v1). The data is uploaded as two .rds files of Seurat objects that can be imported into R. The _main file contains all samples from the nasopharynx, while the _loc file contains data from nasopharyngeal, protected specimen brush, and bronchial lavage samples of two patients. A quantification of viral RNA reads (as CPM, in total over cells and background) is provided as .xlsx file. Please note that these values may differ from viral load estimates obtained from diagnostic procedures and may be less accurate.Raw count values (cellranger output) are provided in the file count_matrices_NBT.tar.
single-cell RNAseq data (data set 16) in the publication scFASTCORMICS: A...
zenodo.org
txt
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Pires Pacheco; Maria Pires Pacheco; Jimmy Ji; Tessy Prohaska; María Moscardó García; Thomas Sauter; Thomas Sauter; Jimmy Ji; Tessy Prohaska; María Moscardó García (2022). single-cell RNAseq data (data set 16) in the publication scFASTCORMICS: A contextualization algorithm to reconstruct metabolic multi-cell population models from single-cell RNAseq data [Dataset]. http://doi.org/10.5281/zenodo.7295001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7295001
Dataset updated
Nov 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Pires Pacheco; Maria Pires Pacheco; Jimmy Ji; Tessy Prohaska; María Moscardó García; Thomas Sauter; Thomas Sauter; Jimmy Ji; Tessy Prohaska; María Moscardó García
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present dataset (dataset16) was used as input to build scFASTCORMICS models. The files correspond to the clusters identified by Seurat in the single-cell data from CD4 T-cells in PACA samples downloaded from the GEO website (GSE156728).

see the protocol: scFASTCORMICS: A contextualization algorithm to reconstruct metabolic multi-cell population models from single-cell RNAseq data

and github: https://github.com/sysbiolux/scFASTCORMICS

For more information, version updates of the scFASTCORMICS.
s
Single cell sequencing data from: The cellular state space of AML unveils...
figshare.scilifelab.se
researchdata.se
+1more
hdf
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrik Lilljebjörn; Thoas Fioretos (2025). Single cell sequencing data from: The cellular state space of AML unveils novel NPM1 subtypes with distinct clinical outcomes and immune evasion properties [Dataset]. http://doi.org/10.17044/scilifelab.23715648.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.23715648.v1
Dataset updated
Jul 15, 2025
Dataset provided by
Lund University
Authors
Henrik Lilljebjörn; Thoas Fioretos
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset contains 10X single cell 3' RNA sequencing gene expression data from from 38 AML-samples from the subtypes NPM1 (n=12), AML-MR (n=11), TP53 (n=7), CBFB::MYH11 (n=3), RUNX1::RUNX1T1 (n=3), AML without class defining mutations (n=1), and AML meeting the criteria for two subtypes (n=1). In addition, reference samples from normal bone marrow mononuclear cells (n=5) and CD34 sorted cells (n=3) are included. The single cell libraries were constructed from viably frozen cells from bone marrow (n=29+8) or peripheral blood (n=9) using the Chromium Single Cell 3' Library & Gel Bead Kit v3 (10X genomics) and sequenced on a Novaseq 6000 or NextSeq 500.Data is available in h5 format for each sample, with raw count output from Cellranger, or as a processed Seurat object with scaled expression data, dimension reductions, and metadata.
n
Data from: Extraocular muscle stem cells exhibit distinct cellular...
data.niaid.nih.gov
zip
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai (2024). Extraocular muscle stem cells exhibit distinct cellular properties associated with non-muscle molecular signatures [Dataset]. http://doi.org/10.5061/dryad.b8gtht7k0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.b8gtht7k0
Dataset updated
Jan 25, 2024
Dataset provided by
Institut Pasteur
Délégation Ile-de-France Ouest et Nord
Authors
Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The muscle stem cell (MuSC) population is recognized as functionally heterogeneous. Cranial muscle stem cells, which originate from head mesoderm, can have greater proliferative capacity in culture and higher regenerative potential in transplantation assays when compared to those in the limb. The existence of such functional differences in phenotypic outputs remain unresolved as a comprehensive understanding of the underlying mechanisms is lacking. We addressed this issue using a combination of clonal analysis, live imaging, and scRNA-seq, identifying critical biological features that distinguish extraocular (EOM) and limb (Tibialis anterior, TA) MuSC populations. Time-lapse studies using a MyogenintdTomato reporter showed that the increased proliferation capacity of EOM MuSCs is accompanied by a differentiation delay in vitro. Unexpectedly, in vitro activated EOM MuSCs expressed a large array of distinct extracellular matrix (ECM) components, growth factors, and signaling molecules that are typically associated with mesenchymal non-muscle cells. These unique features are regulated by a specific set of transcription factors that constitute a coregulating module. This transcription factor network, which includes Foxc1 as one of the major players, appears to be hardwired to EOM identity as it is present in quiescent adult MuSCs, in the activated counterparts during growth and retained upon passages in vitro. These findings provide insights into how high-performing MuSCs regulate myogenic commitment by active remodeling of their local environment. Methods

scRNAseq data generation MuSCs were isolated on BD FACSAriaTM III based on GFP fluorescence and cell viability from Tg:Pax7- nGFP mice (Sambasivan et al., 2009). Quiescent MuSCs were manually counted using a hemocytometer and immediately processed for scRNA-seq. For activated samples, MuSCs were cultured in vitro as described above for four days. Activated MuSCs were subsequently trypsinized and washed in DMEM/F12 2% FBS. Live cells were re-sorted, manually counted using a hemocytometer and processed for scRNA-seq. Prior to scRNAseq, RNA integrity was assessed using Agilent Bioanalyzer 2100 to validate the isolation protocol (RIN>8 was considered acceptable). 10X Genomics Chromium microfluidic chips were loaded with around 9000 cells and cDNA libraries were generated following manufacturer’s protocol. Concentrations and fragment sizes were determined using Agilent Bioanalyzer and Invitrogen Qubit. cDNA libraries were sequenced using NextSeq 500 and High Output v2.5 (75 cycles) kits. Count matrices were subsequently generated following 10X Genomics Cell Ranger pipeline. Following normalisation and quality control, we obtained an average of 5792 ± 1415 cells/condition. Seurat preprocessing scRNAseq datasets were processed using Seurat (https://satijalab.org/seurat/) (Butler et al., 2018). Cells with more than 10% of mitochondrial gene fraction were discarded. 4000-5000 genes were detected on average across all 4 datasets. Dimensionality reduction and UMAPs were generated following Seurat workflow. The top 100 DEGs were determined using Seurat "FindAllMarkers" function with default parameters. When processed independently (scvelo), the datasets were first regressed on cell cycle genes, mitochondrial fraction, number of genes, number of UMI following Seurat dedicated vignette, and doublets were removed using DoubletFinder v3 (McGinnis et al., 2019). A "StressIndex" score was generated for each cell based on the list of stress genes previously reported (Machado et al., 2021) using the “AddModule” Seurat function. 94 out of 98 genes were detected in the combined datasets. UMAPs were generated after 1. StressIndex regression, and 2. after complete removal of the detected stress genes from the gene expression matrix before normalization. In both cases, the overall aspect of the UMAP did not change significantly (Figure S5). Although immeasurable confounding effects of cell stress following isolation cannot be ruled out, we reasoned that our datasets did not show a significant effect of stress with respect to the conclusions of our study. Matrisome analysis After subsetting for the features of the Matrisome database (Naba et al., 2015) present in our single-cell dataset, the matrisome score was calculated by assessing the overall expression of its constituents using the "AddModuleScore" function from Seurat (Butler et al., 2018).

RNA velocity and driver genes Scvelo was used to calculate RNA velocities (Bergen et al., 2020). Unspliced and spliced transcript matrices were generated using velocyto (Manno et al., 2018) command line function. Seurat-generated filtering, annotations and cell-embeddings (UMAP, tSNE, PCA) were then added to the outputted objects. These datasets were then processed following scvelo online guide and documentation. Velocity was calculated based on the dynamical model (using scv.tl.recover_dynamics(adata), and scv.tl.velocity(adata, mode=’dynamical’)) and differential kinetics calculations were added to the model (using scv.tl.velocity(adata, diff_kinetics=True)). Specific driver genes were identified by determining the top likelihood genes in the selected cluster. The lists of the top 100 drivers for EOM and TA progenitors are given in Suppl Tables 10 and 11. Gene regulatory network inference and transcription factor modules Gene regulatory networks were inferred using pySCENIC (Aibar et al., 2017; Sande et al., 2020). This algorithm regroups sets of correlated genes into regulons (i.e. a transcription factor and its targets) based on binding motifs and co-expression patterns. The top 35 regulons for each cluster were determined using scanpy "scanpy.tl.rank_genes_groups" function (method=t-test). Note that this function can yield less than 35 results depending on the cluster. UMAP and heatmap were generated using regulon AUC matrix (Area Under Curve) which refers to the activity level of each regulon in a given cell. Visualizations were performed using scanpy (Wolf et al., 2018). The outputted list of each regulon and their targets was subsequently used to create a transcription factor network. To do so, only genes that are regulons themselves were kept. This results in a visual representation where each node is an active transcription factor and each edge is an inferred regulation between 2 transcription factors. When placed in a force-directed environment, these nodes aggregate based on the number of shared edges. This operation greatly reduced the number of genes involved, while highlighting co-regulating transcriptional modules. Visualization of this network was performed in a force-directed graph using Gephi “Force-Atlas2” algorithm (https://gephi.org/).
Data from: Pre-ciliated tubal epithelial cells are prone to initiation of...
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coulter Ralston; Alexander Nikitin; Benjamin Cosgrove (2024). Pre-ciliated tubal epithelial cells are prone to initiation of high-grade serous ovarian carcinoma [Dataset]. http://doi.org/10.5061/dryad.4mw6m90hm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4mw6m90hm
Dataset updated
Oct 17, 2024
Dataset provided by
Cornell University
Authors
Coulter Ralston; Alexander Nikitin; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The distal region of the uterine (Fallopian) tube is commonly associated with high-grade serous carcinoma (HGSC), the predominant and most aggressive form of ovarian or extra-uterine cancer. Specific cell states and lineage dynamics of the adult tubal epithelium (TE) remain insufficiently understood, hindering efforts to determine the cell of origin for HGSC. Here, we report a comprehensive census of cell types and states of the mouse uterine tube. We show that distal TE cells expressing the stem/progenitor cell marker Slc1a3 can differentiate into both secretory (Ovgp1+) and ciliated (Fam183b+) cells. Inactivation of Trp53 and Rb1, whose pathways are commonly altered in HGSC, leads to elimination of targeted Slc1a3+ cells by apoptosis, thereby preventing their malignant transformation. In contrast, pre-ciliated cells (Krt5+, Prom1+, Trp73+) remain cancer-prone and give rise to serous tubal intraepithelial carcinomas and overt HGSC. These findings identify transitional pre-ciliated cells as a previously unrecognized cancer-prone cell state and point to pre-ciliation mechanisms as novel diagnostic and therapeutic targets. Methods

Single-cell RNA-sequencing library preparation For TE single cell expression and transcriptome analysis we isolated TE from C57BL6 adult estrous female mice. In 3 independent experiments a total of 62 uterine tubes were collected. Each uterine tube was placed in sterile PBS containing 100 IU ml-1 of penicillin and 100 µg ml-1 streptomycin (Corning, 30-002-Cl), and separated in distal and proximal regions. Tissues from the same region were combined in a 40 µl drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with 25G needles. Minced tissues were transferred with help of a sterile wide bore 200 µl pipette tip into a 1.8 ml cryo vial containing 1.2 ml A-mTE-D1 (300 IU ml-1 collagenase IV mixed with 100 IU ml-1 hyaluronidase; Stem Cell Technologies, 07912, in DMEM Ham’s F12, Hyclone, SH30023.FS). Tissues were incubated with loose cap for 1 h at 37°C in a 5% CO2 incubator. During the incubation tubes were taken out 4 times and tissues suspended with a wide bore 200 µl pipette tip. At the end of incubation, the tissue-cell suspension from each tube was transferred into 1 ml TrypLE (Invitrogen, 12604013) pre-warmed to 37°C, suspended 70 times with a 1000 µl pipette tip, 5 ml A-SM [DMEM Ham’s F12 containing 2% fetal bovine serum (FBS)] were added to the mix, and TE cells were pelleted by centrifugation 300x g for 10 minutes at 25°C. Pellets were then suspended with 1 ml pre-warmed to 37°C A-mTE-D2 (7 mg ml-1 Dispase II, Worthington NPRO2, and 10 µg ml-1 Deoxyribonuclease I, Stem Cell Technologies, 07900), and mixed 70 times with a 1000 µl pipette tip. 5 ml A-mTE-D2 was added and samples were passed through a 40 µm cell strainer, and pelleted by centrifugation at 300x g for 7 minutes at +4°C. Pellets were suspended in 100 µl microbeads per 107 total cells or fewer, and dead cells were removed with the Dead Cell Removal Kit (Miltenyi Biotec, 130-090-101) according to the manufacturer’s protocol. Pelleted live cell fractions were collected in 1.5 ml low binding centrifuge tubes, kept on ice, and suspended in ice cold 50 µl A-Ri-Buffer (5% FBS, 1% GlutaMAX-I, Invitrogen, 35050-079, 9 µM Y-27632, Millipore, 688000, and 100 IU ml-1 penicillin 100 μg ml-1 streptomycin in DMEM Ham’s F12). Cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000-6,000 were loaded on Chromium controller (10X Genomics, Single Cell 3’ v2 chemistry) to perform single cell partitioning and barcoding using the microfluidic platform device. After preparation of barcoded, next-generation sequencing cDNA libraries samples were sequenced on Illumina NextSeq500 System.

Download and alignment of single-cell RNA sequencing data For sequence alignment, a custom reference for mm39 was built using the cellranger (v6.1.2, 10x Genomics) mkref function. The mm39.fa soft-masked assembly sequence and the mm39.ncbiRefSeq.gtf (release 109) genome annotation last updated 2020-10-27 were used to form the custom reference. The raw sequencing reads were aligned to the custom reference and quantified using the cellranger count function.

Preprocessing and batch correction All preprocessing and data analysis was conducted in R (v.4.1.1 (2021-08-10)). The cellranger count outs were first modified with the autoEstCont and adjustCounts functions from SoupX (v.1.6.1) to output a corrected matrix with the ambient RNA signal (soup) removed (https://github.com/constantAmateur/SoupX). To preprocess the corrected matrices, the Seurat (v.4.1.1) NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and RunUMAP functions were used to create a Seurat object for each sample (https://github.com/satijalab/seurat). The number of principal components used to construct a shared nearest-neighbor graph were chosen to account for 95% of the total variance. To detect possible doublets, we used the package DoubletFinder (v.2.0.3) with inputs specific to each Seurat object. DoubletFinder creates artificial doublets and calculates the proportion of artificial k nearest neighbors (pANN) for each cell from a merged dataset of the artificial and actual data. To maximize DoubletFinder’s predictive power, mean-variance normalized bimodality coefficient (BCMVN) was used to determine the optimal pK value for each dataset. To establish a threshold for pANN values to distinguish between singlets and doublets, the estimated multiplet rates for each sample were calculated by interpolating between the target cell recovery values according to the 10x Chromium user manual. Homotypic doublets were identified using unannotated Seurat clusters in each dataset with the modelHomotypic function. After doublets were identified, all distal and proximal samples were merged separately. Cells with greater than 30% mitochondrial genes, cells with fewer than 750 nCount RNA, and cells with fewer than 200 nFeature RNA were removed from the merged datasets. To correct for any batch defects between sample runs, we used the harmony (v.0.1.0) integration method (github.com/immunogenomics/harmony).

Clustering parameters and annotations After merging the datasets and batch-correction, the dimensions reflecting 95% of the total variance were input into Seurat’s FindNeighbors function with a k.param of 70. Louvain clustering was then conducted using Seurat’s FindClusters with a resolution of 0.7. The resulting 19 clusters were annotated based on the expression of canonical genes and the results of differential gene expression (Wilcoxon Rank Sum test) analysis. One cluster expressing lymphatic and epithelial markers was omitted from later analysis as it only contained 2 cells suspected to be doublets. To better understand the epithelial populations, we reclustered 6 epithelial populations and reapplied harmony batch correction. The clustering parameters from FindNeighbors was a k.param of 50, and a resolution of 0.7 was used for FindClusters. The resulting 9 clusters within the epithelial subset were further annotated using differential expression analysis and canonical markers.

Pseudotime analysis Potential of heat diffusion for affinity-based transition embedding (PHATE) is dimensional reduction method to more accurately visualize continual progressions found in biological data 35. A modified version of Seurat (v4.1.1) was developed to include the ‘RunPHATE’ function for converting a Seurat Object to a PHATE embedding. This was built on the phateR package (v.1.0.7) (https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again). In addition to PHATE, pseudotime values were calculated with Monocle3 (v.1.2.7), which computes trajectories with an origin set by the user 36,55–57. The origin was set to be a progenitor cell state confirmed with lineage tracing experiments. 35. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019). doi:10.1038/s41587-019-0336-3 36. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). doi:10.1038/s41586-019-0969-x 55. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 32, 381–386 (2014). doi:10.1038/nbt.2859 56. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nature Methods 14, 309–315 (2017). doi:10.1038/nmeth.4150 57. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982 (2017). doi:10.1038/nmeth.4402
m
Data from: CSS: cluster similarity spectrum integration of single-cell...
data.mendeley.com
Updated Aug 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhisong He (2020). CSS: cluster similarity spectrum integration of single-cell genomics data [Dataset]. http://doi.org/10.17632/3kthhpw2pd.2
Explore at:
Unique identifier
https://doi.org/10.17632/3kthhpw2pd.2
Dataset updated
Aug 15, 2020
Authors
Zhisong He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals.

The presented data set here includes 1) the seurat object of the published two-month-old human cerebral organoid scRNA-seq data (Kanton et al. 2019 Nature); 2) the single-cell RNA-seq data of cerebral organoid generated by inDrop; 3) the newly generated single-cell RNA-seq data of cerebral organoids with and without fixation conditions.
single-cell RNAseq data (data set 17) in the publication scFASTCORMICS: A...
zenodo.org
txt
Updated Nov 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Pires Pacheco; Maria Pires Pacheco; Jimmy Ji; Tessy Prohaska; María Moscardó García; Thomas Sauter; Thomas Sauter; Jimmy Ji; Tessy Prohaska; María Moscardó García (2022). single-cell RNAseq data (data set 17) in the publication scFASTCORMICS: A contextualization algorithm to reconstruct metabolic multi-cell population models from single-cell RNAseq data [Dataset]. http://doi.org/10.5281/zenodo.7295029
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7295029
Dataset updated
Nov 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Pires Pacheco; Maria Pires Pacheco; Jimmy Ji; Tessy Prohaska; María Moscardó García; Thomas Sauter; Thomas Sauter; Jimmy Ji; Tessy Prohaska; María Moscardó García
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present dataset (dataset17 was used as input to build scFASTCORMICS models. The files correspond to the clusters identified by Seurat in the single-cell data from PBMC metastatic MCC samples downloaded from the GEO website (GSE117988).

see the protocol: scFASTCORMICS: A contextualization algorithm to reconstruct metabolic multi-cell population models from single-cell RNAseq data

and github: https://github.com/sysbiolux/scFASTCORMICS

For more information, version updates of the scFASTCORMICS.
m
NBAtlas: A harmonized single-cell transcriptomic reference atlas of human...
data.mendeley.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Bonine (2025). NBAtlas: A harmonized single-cell transcriptomic reference atlas of human neuroblastoma tumors. Bonine et al. [Dataset]. http://doi.org/10.17632/yhcf6787yp.3
Explore at:
Unique identifier
https://doi.org/10.17632/yhcf6787yp.3
Dataset updated
Jun 17, 2025
Authors
Noah Bonine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Neuroblastoma, a rare embryonic tumor arising from neural crest development, is responsible for 15% of pediatric cancer-related deaths. Recently, several single-cell transcriptome studies were performed on neuroblastoma patient samples to investigate the cell-of-origin and tumor heterogeneity. However, these individual studies involved a small number of tumors and cells, limiting the conclusions that could be drawn. To overcome this limitation, we integrated seven single-cell or single-nucleus data sets into a harmonized cell atlas covering 362,991 cells across 68 patient samples. We use this atlas to decipher the transcriptional landscape of neuroblastoma at single-cell resolution revealing associations between transcriptomic profiles and clinical outcomes within the tumor compartment. In addition, we characterize the complex immune cell landscape and uncover considerable heterogeneity among tumor-associated macrophages. Finally, we showcase the utility of our atlas as a resource by expanding it with new data and using it as a reference for data-driven cell-type annotation.

seuratObj_NBAtlas_share_v20240130.rds: Seurat Object of the NBAtlas. Be aware, using this object requires roughly 14 GB of memory.
SeuratObj_Share_50kSubset_NBAtlas_v20240130.rds: Light-weight version of the NBAtlas (50 k subset) for portable use.

v2:

seuratObj_NBAtlas_share_v20241203.rds: Cleaned Seurat Object of the NBAtlas (doublets from the zooms filtered out).

v3:

SeuratMeta_TumorZoom_NBAtlas_v20250228.rds: Metadata of tumor zoom, for annotation use "clusters" or "cluster_nr", for umap coordinates use "scviUMAP_1" and "scviUMAP_2". This metadata can be used to subset the entire atlas Seurat object to obtain a tumor zoom Seurat object.
n
Data from: Dermomyotome-derived endothelial cells migrate to the dorsal...
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Traver; Pankaj Sahai-Hernandez; Claire Pouget; Shai Eyal; Ondrej Svoboda; Jose Chacon; Lin Grimm; Tor Gjøen (2023). Dermomyotome-derived endothelial cells migrate to the dorsal aorta to support hematopoietic stem cell emergence [Dataset]. http://doi.org/10.6075/J0GB22J0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6075/J0GB22J0
Dataset updated
Oct 4, 2023
Dataset provided by
University of Oslo
University of California, San Diego
Authors
David Traver; Pankaj Sahai-Hernandez; Claire Pouget; Shai Eyal; Ondrej Svoboda; Jose Chacon; Lin Grimm; Tor Gjøen
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Development of the dorsal aorta is a key step in the establishment of the adult blood-forming system since hematopoietic stem and progenitor cells (HSPCs) arise from ventral aortic endothelium in all vertebrate animals studied. Work in zebrafish has demonstrated that arterial and venous endothelial precursors arise from distinct subsets of lateral plate mesoderm. Here, we profile the transcriptome of the earliest detectable endothelial cells (ECs) during zebrafish embryogenesis to demonstrate that tissue-specific EC programs initiate much earlier than previously appreciated, by the end of gastrulation. Classic studies in the chick embryo showed that paraxial mesoderm generates a subset of somite-derived endothelial cells (SDECs) that incorporate into the dorsal aorta to replace HSPCs as they exit the aorta and enter circulation. We describe a conserved program in the zebrafish, where a rare population of endothelial precursors delaminates from the dermomyotome to incorporate exclusively into the developing dorsal aorta. Although SDECs lack hematopoietic potential, they act as a local niche to support the emergence of HSPCs from neighboring hemogenic endothelium. Thus, at least three subsets of ECs contribute to the developing dorsal aorta: vascular ECs, hemogenic ECs, and SDECs. Taken together, our findings indicate that the distinct spatial origins of endothelial precursors dictate different cellular potentials within the developing dorsal aorta. Methods Single-cell RNA sample preparation After FACS, total cell concentration and viability were ascertained using a TC20 Automated Cell Counter (Bio-Rad). Samples were then resuspended in 1XPBS with 10% BSA at a concentration between 800-3000 per ml. Samples were loaded on the 10X Chromium system and processed as per manufacturer’s instructions (10X Genomics). Single cell libraries were prepared as per the manufacturer’s instructions using the Single Cell 3’ Reagent Kit v2 (10X Genomics). Single cell RNA-seq libraries and barcode amplicons were sequenced on an Illumina HiSeq platform. Single-cell RNA sequencing analysis The Chromium 3’ sequencing libraries were generated using Chromium Single Cell 3’ Chip kit v3 and sequenced with (actually, I don’t know:( what instrument was used?). The Ilumina FASTQ files were used to generate filtered matrices using CellRanger (10X Genomics) with default parameters and imported into R for exploration and statistical analysis using a Seurat package (La Manno et al., 2018). Counts were normalized according to total expression, multiplied by a scale factor (10,000), and log-transformed. For cell cluster identification and visualization, gene expression values were also scaled according to highly variable genes after controlling for unwanted variation generated by sample identity. Cell clusters were identified based on UMAP of the first 14 principal components of PCA using Seurat’s method, Find Clusters, with an original Louvain algorithm and resolution parameter value 0.5. To find cluster marker genes, Seurat’s method, FindAllMarkers. Only genes exhibiting significant (adjusted p-value < 0.05) a minimal average absolute log2-fold change of 0.2 between each of the clusters and the rest of the dataset were considered as differentially expressed. To merge individual datasets and to remove batch effects, Seurat v3 Integration and Label Transfer standard workflow (Stuart et al., 2019)

Facebook

Twitter

Click to copy link

Link copied

Cite

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.t4b8gtj34

Dataset updated

Dec 14, 2021

Dataset provided by

Cornell University

Authors

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

Clear search

Close search

Google apps

Main menu

Data from: Large-scale integration of single-cell transcriptomic data...

Processed naive T cell single-cell RNA-seq, Seurat object

pbmc single cell RNA-seq matrix

Processed HSPCs single-cell RNA-seq, Seurat object

Bassez et al. (2021) Breast Cancer processed dataset

Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to...

A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a...

Datasets accompanying scANANSE

85 shared genes in DEGs related to mouse age.

Data from: Robust clustering and interpretation of scRNA-seq data using...

Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...

Data from: COVID-19 severity correlates with airway epithelium-immune cell...

single-cell RNAseq data (data set 16) in the publication scFASTCORMICS: A...

Single cell sequencing data from: The cellular state space of AML unveils...

Data from: Extraocular muscle stem cells exhibit distinct cellular...

Data from: Pre-ciliated tubal epithelial cells are prone to initiation of...

Data from: CSS: cluster similarity spectrum integration of single-cell...

single-cell RNAseq data (data set 17) in the publication scFASTCORMICS: A...

NBAtlas: A harmonized single-cell transcriptomic reference atlas of human...

v2:

v3:

Data from: Dermomyotome-derived endothelial cells migrate to the dorsal...

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration