Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output of FindAllMarkers function in Seurat was printed as a table. (XLSX)
Facebook
TwitterHematopoietic cells were stained with fluorochrome-conjugated antibodies against human CD45, CD3, CD19 and CD14 and stromal cells with fluorochrome-conjugated antibodies against human CD45 and CD235a. Live/dead cell discrimination was performed by adding 7-amino-actinomycin D (7AAD; Calbiochem) prior to acquisition. CD45– CD235a– stromal cells, CD45+ CD3+ T cells and CD45+ CD19+ B cells were sorted with a BD FACS Melody cell sorter (BD Biosciences) and run on the 10x Chromium analyzer (10X Genomics). cDNA library generation was performed following the established commercial protocol for Chromium Single Cell 3’ Reagent Kit (v3 Chemistry). Libraries were run via Novaseq 6000 for Illumina sequencing at the Functional Genomic Center Zurich. A total of 20 samples were collected from 9 patients and processed in 6 batches. All samples from the same patient were processed in the same batch. Gene expression estimation from sequencing files was done using CellRanger (v3.0.2) count with Ensembl GRCh38.9 release as reference to build the index for human samples. Next, quality control was performed in R v.4.0.0 using the R/Bioconductor package scater (v.1.16.0) and included removal of damaged and contaminating cells based on (1) very high or low UMI counts (>2.5 median absolute deviation from the median across all cells), (2) very high or low total number of detected genes (>2.5 median absolute deviation from the median across all cells) and (3) high mitochondrial gene content (> 2.5 median absolute deviations above the median across all cells). In addition, contaminating cells expressing any of the markers CD3E, PTPRC, CD79A or GYPA were removed from stromal cell samples. Downstream analysis was performed using the Seurat R package (v.4.0.1) and included normalization, scaling, dimensionality reduction with PCA and UMAP, graph-based clustering and calculation of unbiased cluster markers as well as dimensionality reduction with diffusionmap as implemented in the scater R/Bioconductor package (v.1.16.0). Clusters were characterized based on the expression of calculated cluster markers and canonical marker genes as reported in previous publications. For the extended stromal cell analysis, two contaminating clusters with 50 cycling cells and 150 cells expressing both fibroblast and endothelial marker genes (indicative of doublets) were removed. For high resolution FRC analysis, FRC subsets were re-embedded and two clusters containing 256 cells with high levels of endothelial or mitochondrial/non-coding genes, respectively, were excluded. Comparative analysis included determination of cell type-, subset- and condition-specific gene signatures. Thereby differentially expressed genes were calculated running the FindAllMarkers function from Seurat R package.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This sc-RNAseq dataset is composed of disease-unaffected epidermal samples from 96 skin biopsies: 18 from published datasets - GSE173706, GSE249279 – and 78 newly generated ones. Biopsy sample and protocol details, and curated cell-type signature genes, are available in the scRNASeq_source_info_FigShare spreadsheet of this dataset. Processed Seurat object are provided herein. Raw data are available in SRA (id PRJNA1054546). Biopsies originated from seven body sites (face, scalp, axilla, palmoplantar, arm, leg, and back). The skin biopsies were separated into epidermis and dermis before dissociated and enriched for various cell fractions (keratinocytes, fibroblasts, and endothelial cells) and immune cells (myeloid and lymphoid cells) to up sample rare cell types. In total, across body sites, 274,834 cells were profiled, including 96,194 keratinocytes. Seurat v3.0. was utilized to normalize, scale, and reduce the dimensionality of the data. Low quality cells containing less than 200 genes per cell as well as greater than 5,000 genes per cell were filtered out. Cells containing more mitochondrial genes than the permitted quantile of 0.05 were removed. Ambient RNA was removed using R package SoupX v1.6.2. Doublets were removed using scDblFinder v1.12.0. Principal components (PC) were obtained from the topmost 2,000 variable genes, and the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique was applied to the 30 topmost variable PC-reduced dataset. Batch effect correction was performed utilizing harmony v1.0, using donor as batch. After batch correction, cells were clustered using shared nearest neighbor modularity optimization-based clustering. Cluster marker genes were identified with FindAllMarkers; cluster corresponding cell type was identified by comparing marker genes to curated cell-type signature genes. Differential expression by keratinocyte subtype was performed with Seurat (v4.3.0) FindMarkers function by comparing keratinocyte subtype to non-keratinocyte clusters. The log fold-change of the average expression between a keratinocyte subtype cluster compared to the rest of clusters is utilized as keratinocyte-subtype gene expression statistic.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Innate lymphoid cells (ILCs) are enriched at mucosal surfaces where they respond rapidly to environmental stimuli and contribute to both tissue inflammation and healing. To gain insight into the role of ILCs in the pathology and recovery from COVID-19 infection, we employed a multi-omic approach consisting of Abseq and targeted mRNA sequencing to respectively probe the surface marker expression, transcriptional profile and heterogeneity of ILCs in peripheral blood of patients with COVID-19 compared with healthy controls. We found that the frequency of ILC1 and ILC2 cells was significantly increased in COVID-19 patients. Moreover, all ILC subsets displayed a significantly higher frequency of CD69-expressing cells, indicating a heightened state of activation. ILC2s from COVID-19 patients had the highest number of significantly differentially expressed (DE) genes. The most notable genes DE in COVID-19 vs healthy participants included a) genes associated with responses to virus infections and b) genes that support ILC self-proliferation, activation and homeostasis. In addition, differential gene regulatory network analysis revealed ILC-specific regulons and their interactions driving the differential gene expression in each ILC. Overall, this study provides mechanistic insights into the characteristics of ILC subsets activated during COVID-19 infection.
Methods
Study participants, blood draws and processing
Participants were recruited as described previously from adults who had a positive SARS-COV-2 RT-PCR test at Stanford Health Care (NCT04373148). Collection of Covid samples occurred between May to December 2020. The cohort used in this study consisted of asymptomatic (n=2), mild (n=17), and moderate (n=3) COVID-19 infections, some of whom developed long term COVID-19 (n=15). The clinical case severities at the time of diagnosis were defined as asymptomatic, moderate or mild according to the guidelines released by NIH. Long term (LT) COVID was defined as symptoms occurring 30 or more days after infection, consistent with CDC guidelines. Some participants in our study continued to have LT COVID symptoms 90 days after diagnosis (n=12). Exclusion criteria for COVID sample study were NIH severity diagnosis of severe or critical at the time of positive covid test. Samples selected for this study were obtained within 76 days of positive PCR COVID-19 test date. Healthy controls were selected who had sample collection before 2020. Informed consent was obtained from all participants. All protocols were approved by the Stanford Administrative Panel on Human Subjects in Medical Research. Peripheral blood was drawn by venipuncture and using validated and published procedures, peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-based density gradient centrifugation, frozen in aliquots and stored in liquid nitrogen at -80°C , until thawing. A summary of participant demographics is presented in Supp. Table 1.
ILC Enrichment, single cell captures for Abseq and targeted mRNAseq
Participant PBMCs were thawed, and each sample stained with Sample Tag (BD #633781) at room temperature for 20 minutes. Samples were combined in healthy control or COVID-19 tubes. Cells were surface stained with a panel of fluorochrome-conjugated antibodies (Supp. Table 2) in buffer (PBS with 0.25% BSA and 1mM EDTA) for 20 minutes at room temperature prior to immunomagnetic negative selection for ILCs. Following ILC enrichment using the EasySep human Pan-ILC enrichment kit (StemCell Technologies #17975), cells from healthy and COVID-19 recovered participants were counted and normalized before combining. ILCs were sorted using a BD FACS Aria at the Stanford FACS facility prior to incubation with AbSeq oligo-linked mAbs (Supp. Table 3). Sorted cells were processed by the Stanford Human Immune Monitoring Center (HIMC) using the BD Rhapsody platform. Library was prepared using the BD Immune Response Targeting Panel (BD Kit #633750) with addition of custom gene panel reagents (Supp. Table 4) and sequenced on Illumina NovaSeq 6000 at Stanford Genomics Sequencing Center (SGSC). ILCs were identified as Lineageneg (CD3neg, CD14neg, CD34neg, CD19neg), NKG2Aneg, CD45+ and ILCs further defined as CD127+CD161+ and as subsets: ILC1 (CD117negCRTH2neg), ILC2 (CRTH2+) and ILCp (CD117+CRTH2neg) (Supp. Fig. 1).
Computational data analysis
The above multi-modal setup allowed paired measurements of cellular transcriptome and cell surface protein abundance. The ILC1, ILC2 and ILCp cells were manually gated based on the abundance profile of CD127, CD117, CD161 and CRTH2 (Supp. Fig. 1). Before the integrative analysis, the complete multi-modal single cell dataset containing ILC subsets was converted into single Seurat object. All the subsequent protein-level and gene-level analyses were performed using multimodal data analysis pipeline of Seurat R package version 4.0. The normalized and scaled protein abundance profile was used for estimating the integrated harmony dimensions using runHarmony function in Seurat R package (reduction= ‘apca’ and group.by.vars = ‘batch’) . The batch corrected harmony embeddings were then used for computing the Uniform Manifold Approximation and Projection (UMAP) dimensions to visualize the clusters of ILC subsets. Differential marker analysis of surface proteins, between two groups of cells (COVID-19 and Healthy cohort), from abseq panels was computed with normalized and scaled expression values using FindMarkers function from Seurat R package (test.use=’wilcox’). Similarly, differential gene expression was performed on normalized and scaled gene expression values from between two groups of cells (COVID-19 and Healthy cohort) using the FindMarkers function from Seurat R package (test.use=’MAST’ and latent.vars=’batch’). Genes with log-fold change > 0.5 and adjusted p-value < 0.05 (method: Benjamini-Hochberg) (were considered as significant for further evaluation. The resulting adjusted p-values box-plots were plotted using ggplot2 R package (version 3.4.2) after computing the number of cells expressing a given protein or gene in each sample. Pathway enrichment analysis of DE genes was performed using web-server metascape (version 3.5). The AUCells score and gene regulatory network analysis was performed using pySCENIC pipeline (version 0.12.1). Gene regulatory network was reconstructed using GRNBoost2 algorithm and the list of TFs in humans (genome version: hg38) were obtained from cisTarget database. (https://resources.aertslab.org/cistarget). Cellular enrichment (aka AUCell) analysis that measures the activity of TF or gene signatures across all single cells was performed using aucell function in pySCENIC python library. The ggplot2 R package (version 3.4.2) was used for boxplot visualization. The differential gene co-expression analysis was performed using scSFMnet R package. Circular plots were generated using the R package circlize (version 0.4.15).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Development of the dorsal aorta is a key step in the establishment of the adult blood-forming system since hematopoietic stem and progenitor cells (HSPCs) arise from ventral aortic endothelium in all vertebrate animals studied. Work in zebrafish has demonstrated that arterial and venous endothelial precursors arise from distinct subsets of lateral plate mesoderm. Here, we profile the transcriptome of the earliest detectable endothelial cells (ECs) during zebrafish embryogenesis to demonstrate that tissue-specific EC programs initiate much earlier than previously appreciated, by the end of gastrulation. Classic studies in the chick embryo showed that paraxial mesoderm generates a subset of somite-derived endothelial cells (SDECs) that incorporate into the dorsal aorta to replace HSPCs as they exit the aorta and enter circulation. We describe a conserved program in the zebrafish, where a rare population of endothelial precursors delaminates from the dermomyotome to incorporate exclusively into the developing dorsal aorta. Although SDECs lack hematopoietic potential, they act as a local niche to support the emergence of HSPCs from neighboring hemogenic endothelium. Thus, at least three subsets of ECs contribute to the developing dorsal aorta: vascular ECs, hemogenic ECs, and SDECs. Taken together, our findings indicate that the distinct spatial origins of endothelial precursors dictate different cellular potentials within the developing dorsal aorta. Methods Single-cell RNA sample preparation After FACS, total cell concentration and viability were ascertained using a TC20 Automated Cell Counter (Bio-Rad). Samples were then resuspended in 1XPBS with 10% BSA at a concentration between 800-3000 per ml. Samples were loaded on the 10X Chromium system and processed as per manufacturer’s instructions (10X Genomics). Single cell libraries were prepared as per the manufacturer’s instructions using the Single Cell 3’ Reagent Kit v2 (10X Genomics). Single cell RNA-seq libraries and barcode amplicons were sequenced on an Illumina HiSeq platform. Single-cell RNA sequencing analysis The Chromium 3’ sequencing libraries were generated using Chromium Single Cell 3’ Chip kit v3 and sequenced with (actually, I don’t know:( what instrument was used?). The Ilumina FASTQ files were used to generate filtered matrices using CellRanger (10X Genomics) with default parameters and imported into R for exploration and statistical analysis using a Seurat package (La Manno et al., 2018). Counts were normalized according to total expression, multiplied by a scale factor (10,000), and log-transformed. For cell cluster identification and visualization, gene expression values were also scaled according to highly variable genes after controlling for unwanted variation generated by sample identity. Cell clusters were identified based on UMAP of the first 14 principal components of PCA using Seurat’s method, Find Clusters, with an original Louvain algorithm and resolution parameter value 0.5. To find cluster marker genes, Seurat’s method, FindAllMarkers. Only genes exhibiting significant (adjusted p-value < 0.05) a minimal average absolute log2-fold change of 0.2 between each of the clusters and the rest of the dataset were considered as differentially expressed. To merge individual datasets and to remove batch effects, Seurat v3 Integration and Label Transfer standard workflow (Stuart et al., 2019)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 10: Dataset 8. Lists of markers in 2 sub-clusters of club cells, namely autoimmune-prone sub-cluster of club cells and mix sub-cluster of club cells, related to Fig. 5D and Fig. S5M. The resulting sub-cluster markers were identified by the seurat FindAllMarkers, including the p-value, the average log2(fold change) of the gene in the sub-cluster compared to all other sub-clusters, the percent of cells expressing the gene in the sub-cluster (pct.1), the percent of cells expressing the gene in all other sub-clusters (pct.2), and the adjusted p-value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The original intent of assembling a data set of publicly-available tumor-infiltrating T cells (TILs) with paired TCR sequencing was to expand and improve the scRepertoire R package. However, after some discussion, we decided to release the data set for everyone, a complete summary of the sequencing runs and the sample information can be found in the meta data of the Seurat object. This repository contains the code for the initial processing and annotating of the data set (we are calling this version 0.0.1). This involves several steps 1) loading the respective GE data, 2) harmonizing the data by sample and cohort information, 3) iterating through automatic annotation, 4) unifying annotation via manual inspection and enrichment analysis, and 5) adding the TCR information.
Methods
Single-Cell Data Processing
The filtered gene matrices output from Cell Ranger align function from individual sequencing runs (10x Genomics, Pleasanton, CA) loaded into the R global environment. For each sequencing run cell barcodes were appended to contain a unique prefix to prevent issues with duplicate barcodes. The results were then ported into individual Seurat objects (citation), where the cells with > 10% mitochondrial genes and/or 2.5x natural log distribution of counts were excluded for quality control purposes. At the individual sequencing run level, doublets were estimated using the scDblFinder (v1.4.0) R package. All the sequencing runs across experiments were merged into a single Seurat Object using the merge() function. All the data was then normalized using the default settings and 2,000 variable genes were identified using the "vst" method. Next the data was scaled with the default settings and principal components were calculated for 40 components. Data was integrated using the harmony (v1.0.0) R package (citation) using both cohort and sample information to correct for batch effect with up to 20 iterations. The UMAP was created using the runUMAP() function in Seurat, using 20 dimensions of the harmony calculations.
Annotation of Cells
Automatic annotation was performed using the singler (v1.4.1) R package (citation) with the HPCA (citation) and DICE (citation) data sets as references and the fine label discriminators. Individual sequencing runs were subsetted to run through the singleR algorithm in order to reduce memory demands. The output of all the singleR analyses were collated and appended to the meta data of the seurat object. Likewise, the ProjecTILs (v0.4.1) R Package (citation) was used for automatic annotation as a partially orthogonal approach. Consensus annotation was derived from all 3 databases (HPCA, DICE, ProjecTILs) using a majority approach. No annotation designation was assigned to cells that returned NA for both singleR and ProjecTILs. Mixed annotations were designated with SingleR identified non-Tcells and ProjecTILs identified T cells. Cell type designations with less than 100 cells in the entire cohort were reduced to "other". Automated annotations were checked manually using canonical marker genes and gene enrichment analysis performed using UCell (v1.0.0) R package (citation).
Addition of TCR data
The filtered contig annotation T cell receptor (TCR) data for available sequencing runs were loaded into the R global environment. Individual contigs were combined using the combineTCR() function of scRepertoire (v1.3.2) R Package (citation). Clonotypes were assigned to barcodes and were multiple duplicate chains for individual cells were filtered to select for the top expressing contig by read count. The clonotype data was then added to the Seurat Object with proportion across individual patients being used to calculate frequency.
Citations
As of right now, there is no citation associated with the assembled data set. However if using the data, please find the corresponding manuscript for each data set in the meta.data of the single-cell object. In addition, if using the processed data, feel free to modify the language in the methods section (above) and please cite the appropriate manuscripts of the software or references that were used.
Itemized List of the Software Used
Itemized List of Reference Data Used
Future Directions
There are areas in which we are actively hoping to develop to further facilitate the usefulness of the data set - if you have other suggestions, please reach out using the contact information below.
Contact
Questions, comments, suggestions, please feel free to contact Nick Borcherding via this repository, email, or using twitter.
Facebook
TwitterThis dataset contains R seurat objects used to reproduce the single-cell RNA-seq analyses for the manuscript Single-cell consequences of X-linked meiotic drive in stalk-eyed flies. Testis tissue from eight male Teleopsis dalmanni (drive and standard genotypes) was dissociated and sequenced using the 10X Genomics Chromium platform. Sequencing reads were processed with Cell Ranger v7.2.0, and downstream filtering, doublet removal, integration, and clustering were performed in Seurat v5.1.0. The final dataset (seurat_final.RData) comprises 12,217 high-quality cells expressing 12,454 genes, with cell types identified using orthologous markers from Drosophila melanogaster. Provided files include the filtered integrated Seurat object and a final processed object with reclustered and annotated cell types. These resources enable full reproducibility of the analyses and support future exploration of testis cell populations in stalk-eyed flies. , , # Seurat objects for the manuscript Single-cell consequences of X-linked meiotic drive in stalk-eyed flies
Dataset DOI: 10.5061/dryad.q573n5twb
Sequencing data from Price et al. (2025; 10.5061/dryad.zkh1893kb) was processed using Cell Ranger v7.2.0. First, a custom reference genome was built with the T. dalmanni reference genome using mkref. Using cellrangers count function, fastq reads were then aligned against the custom index and counted, creating gene-by-cell count matrices. Data filtering and downstream analyses were performed using Seurat v5.1.0 in R v4.3.2. Cells in each sample were removed from the analysis if they expressed fewer than 200 features and more than 20% mitochondrial expression. Count data for each sample was also filtered by only keeping genes with expression (counts > 1) in at least three cells. We used DoubletFinder v2.0.4 in R with default parameters to identify and remove doublets. O...,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using 23-months old mice of a inducible expression of human a-syn constructs based Parkinson mouse model, we produced a single nucleus RNA dataset by cutting 0mm Bregma to -5mm Bregma. The Chromium 3’ Single Cell Library Kit (10x Genomics) was used and Sequencing was performed on a NovaSeq 6000. From the same model we also used 20-months old mice with the Visium Spatial V1 platform (10x Genomics). Sequencing was performed on a NovaSeq 6000. Both were PE150.
snRNA pipeline: For the alignment of reads, a custom reference was created by adding the sequences of the V1S/SV2 transgene and the Camk2a promoter to the mm10 mouse reference genome. Count matrices generated by cellranger count 7.1 were loaded into an AnnData object and processed using the Python-based framework Scanpy 1.10.2. Integration with R, where needed, was facilitated through the rpy2 package. Raw count matrices were corrected for ambient RNA contamination using the SoupX 1.6.2. To remove potential doublets, scDblFinder 1.18.0 was employed with a fixed seed (123). Nuclei with nUMI and nGenes values exceeding three median absolute deviations (MADs) from the median were excluded. Genes detected in fewer than five nuclei across the dataset were excluded. The resulting dataset was normalized via scanpy.pp.normalize_total and scanpy.pp.log1p. Highly variable genes were identified using the function scanpy.pp.highly_variable_genes with the Seurat v3 flavor, selecting the top 4,000 genes. Dimensionality reduction was performed using principal component analysis (PCA) and batch effects were corrected using the python-implemented version of Harmony via the function scanpy.external.pp.harmony_integrate. Harmony embeddings were then used to construct a k-nearest neighbor (kNN) graph with scanpy.pp.neighbors. Clustering was performed using Leiden clustering with standard parameters via the function scanpy.tl.leiden. Clusters were annotated using literature, the mousebrain.org, and markers identified via the FindConservedMarkers function in Seurat. First, neurons and non-neuronal cells were distinguished using mainly canonical markers, such as but not limited to Rbfox3 (neurons), Mbp (oligodendrocytes), Acsbg1 (astrocytes), Pdgfra (oligodendrocyte precursor cells), Inpp5d (microglia), Colec12 (vascular cells), and Ttr (choroid plexus cells). Neurons were further classified into Vglut1 (Slc17a7), Vglut2 (Slc17a6), GABA (Gad2), cholinergic (Scube1), and dopaminergic (Th) neurons. Vglut1 and GABA neurons were further annotated into subtypes based on subclustering and FindConservedMarkers markers.
visium spatial pipeline: Sequences were fiducially aligned to spots using Loupe Browser ver. 8. All aligned sequences were mapped using spaceranger count 3.0.1 with a custom refence, which included sequences for the promotor and transgene (Camk2aTTA, V1S/SV2) to the mouse genome mm39. We filtered each sample of the Visium Spatial dataset based on the MAD filtering of number of reads (nUMI), number of genes (nGene), and percentage of mitochondrial genes (percent.mt). A spot was filtered out if it was outside of 3x MAD value in at least two metrics. Filtered samples were merged into one Seurat 5.1.0 object and we obtained normalized counts by the SCTransform function of Seurat. Integration was performed using Harmony 1.2.0 on 50 PCA embeddings and clustering was done using Leiden clustering based on 30 harmony embeddings. Integrated clusters were visualized using the UMAP method. Samples that were not successfully integrated (based on similarity measures of the harmony embeddings) and showed high percentage.mt or low nUMI levels compared to other samples, were removed from subsequent analysis. A final integration and clustering were performed after filtering. Regions were first annotated based on a 0.1 resolution clustering to get high level region annotation (Cortex, Hippocampus, Subcortex). Each high-level region was further annotated based on either more granular resolutions or subclustering. Marker genes from mousebrain.org and literature were used in combination with the Allen mouse brain atlas to obtain anatomically relevant annotations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output of FindAllMarkers function in Seurat was printed as a table. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output of FindAllMarkers function in Seurat was printed as a table. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output of FindAllMarkers function in Seurat was printed as a table. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Droplet-based single-cell and single nucleus RNA sequencing analysis of murine heartsTo obtain sufficient numbers of cells from all cardiac cell types, a total of n=14 samples (WT: 4 samples; TCRM: 6 samples; TCRM isotype: 2 samples; TCRM 14-D10-2: 2 samples) from mouse hearts were processed for single cell RNA sequencing, and n=9 samples (WT: 2 samples; TCRM: 3 samples; TCRM isotype: 2 samples; TCRM 14-D10-2: 2 samples) were prepared for single nucleus RNA sequencing analysis. Samples were processed and sequenced in n=3 batches with all batches spanning multiple conditions. Single cell suspensions were run using the 10x Chromium (10x Genomics) system. The cDNA libraries were generated according to the established commercial protocols for Chromium Single Cell 3’ Reagent Kit (NextGem Chemistry) and Chromium Nuclei Isolation Kit. All libraries were sequenced by NovaSeq 6000 Illumina sequencing at the Functional Genomic Center Zurich. Gene expression was analyzed from sequencing data using CellRanger (v.5.0.1) count, with Ensembl GRCm38.9 as reference. Next, quality control was carried out in R (v.4.2.1) using the R/Bioconductor packages scater (v.1.24.0) and SingleCellExperiment (v.1.18.0) packages. This involved the identification and removal of damaged cells/nuclei or doublets, based on criteria including unusual UMI or gene counts (>2.5 median absolute deviation from the median across all cells) and high mitochondrial gene content (> 2.5 median absolute deviations above the median across all cells). After performing quality control, the final dataset included 31 078 cells and 24 995 nuclei.Downstream analysis was performed using the Seurat R package (v.4.1.1). First, all samples were merged and integrated across data type (single cell or single nucleus data) using the IntegrateData function from the Seurat R package to account for differences between single cell and single nucleus data. Downstream analysis further included normalization, scaling, dimensional reduction with PCA and UMAP, graph-based clustering and calculation of unbiased cluster markers. Clusters were characterized based on the expression of calculated cluster markers and canonical marker genes as reported in previous publications (refs). In order to examine expression signatures of Fibroblasts in more detail cells assigned as Fibroblasts were re-embedded and re-analysed individually.Droplet-based single nucleus RNA sequencing analysis of human heart biopsiesAs for murine samples, isolated nuclei from human heart biopsies were run using the 10x Chromium (10x Genomics) system and cDNA libraries were generated according to the established commercial protocols for Chromium Single Cell 3’ Reagent Kit (NextGem Chemistry) and Chromium Nuclei Isolation Kit. Libraries were sequenced by NovaSeq 6000 Illumina sequencing at the Functional Genomic Center Zurich and gene expression was estimated using CellRanger (v.5.0.1) count, with Ensembl GRCh38.103 as reference. Quality control included the removal of nuclei with unusual UMI or gene counts (>2.5 median absolute deviation from the median across all cells) and was performed in R v.4.2.1 using the R/Bioconductor packages scater (v.1.24.0) and SingleCellExperiment (v.1.18.0).For downstream analysis with the Seurat R package (v.4.3.0) all samples were merged and integrated across patient ID using the IntegrateData function. Integrated data was further processed running normalization, scaling, dimensional reduction with PCA and UMAP, graph-based clustering and calculation of unbiased cluster markers. Clusters were characterized based on the expression of calculated cluster markers and canonical marker genes as reported in previous publications. Following cluster assignments samples were grouped based on their T cell proportions and groups were compared by calculating differentially expressed genes using the FindAllMarkers function from the Seurat R package.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The distal region of the uterine (Fallopian) tube is commonly associated with high-grade serous carcinoma (HGSC), the predominant and most aggressive form of ovarian or extra-uterine cancer. Specific cell states and lineage dynamics of the adult tubal epithelium (TE) remain insufficiently understood, hindering efforts to determine the cell of origin for HGSC. Here, we report a comprehensive census of cell types and states of the mouse uterine tube. We show that distal TE cells expressing the stem/progenitor cell marker Slc1a3 can differentiate into both secretory (Ovgp1+) and ciliated (Fam183b+) cells. Inactivation of Trp53 and Rb1, whose pathways are commonly altered in HGSC, leads to elimination of targeted Slc1a3+ cells by apoptosis, thereby preventing their malignant transformation. In contrast, pre-ciliated cells (Krt5+, Prom1+, Trp73+) remain cancer-prone and give rise to serous tubal intraepithelial carcinomas and overt HGSC. These findings identify transitional pre-ciliated cells as a previously unrecognized cancer-prone cell state and point to pre-ciliation mechanisms as novel diagnostic and therapeutic targets. Methods
Single-cell RNA-sequencing library preparation For TE single cell expression and transcriptome analysis we isolated TE from C57BL6 adult estrous female mice. In 3 independent experiments a total of 62 uterine tubes were collected. Each uterine tube was placed in sterile PBS containing 100 IU ml-1 of penicillin and 100 µg ml-1 streptomycin (Corning, 30-002-Cl), and separated in distal and proximal regions. Tissues from the same region were combined in a 40 µl drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with 25G needles. Minced tissues were transferred with help of a sterile wide bore 200 µl pipette tip into a 1.8 ml cryo vial containing 1.2 ml A-mTE-D1 (300 IU ml-1 collagenase IV mixed with 100 IU ml-1 hyaluronidase; Stem Cell Technologies, 07912, in DMEM Ham’s F12, Hyclone, SH30023.FS). Tissues were incubated with loose cap for 1 h at 37°C in a 5% CO2 incubator. During the incubation tubes were taken out 4 times and tissues suspended with a wide bore 200 µl pipette tip. At the end of incubation, the tissue-cell suspension from each tube was transferred into 1 ml TrypLE (Invitrogen, 12604013) pre-warmed to 37°C, suspended 70 times with a 1000 µl pipette tip, 5 ml A-SM [DMEM Ham’s F12 containing 2% fetal bovine serum (FBS)] were added to the mix, and TE cells were pelleted by centrifugation 300x g for 10 minutes at 25°C. Pellets were then suspended with 1 ml pre-warmed to 37°C A-mTE-D2 (7 mg ml-1 Dispase II, Worthington NPRO2, and 10 µg ml-1 Deoxyribonuclease I, Stem Cell Technologies, 07900), and mixed 70 times with a 1000 µl pipette tip. 5 ml A-mTE-D2 was added and samples were passed through a 40 µm cell strainer, and pelleted by centrifugation at 300x g for 7 minutes at +4°C. Pellets were suspended in 100 µl microbeads per 107 total cells or fewer, and dead cells were removed with the Dead Cell Removal Kit (Miltenyi Biotec, 130-090-101) according to the manufacturer’s protocol. Pelleted live cell fractions were collected in 1.5 ml low binding centrifuge tubes, kept on ice, and suspended in ice cold 50 µl A-Ri-Buffer (5% FBS, 1% GlutaMAX-I, Invitrogen, 35050-079, 9 µM Y-27632, Millipore, 688000, and 100 IU ml-1 penicillin 100 μg ml-1 streptomycin in DMEM Ham’s F12). Cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000-6,000 were loaded on Chromium controller (10X Genomics, Single Cell 3’ v2 chemistry) to perform single cell partitioning and barcoding using the microfluidic platform device. After preparation of barcoded, next-generation sequencing cDNA libraries samples were sequenced on Illumina NextSeq500 System.
Download and alignment of single-cell RNA sequencing data For sequence alignment, a custom reference for mm39 was built using the cellranger (v6.1.2, 10x Genomics) mkref function. The mm39.fa soft-masked assembly sequence and the mm39.ncbiRefSeq.gtf (release 109) genome annotation last updated 2020-10-27 were used to form the custom reference. The raw sequencing reads were aligned to the custom reference and quantified using the cellranger count function.
Preprocessing and batch correction All preprocessing and data analysis was conducted in R (v.4.1.1 (2021-08-10)). The cellranger count outs were first modified with the autoEstCont and adjustCounts functions from SoupX (v.1.6.1) to output a corrected matrix with the ambient RNA signal (soup) removed (https://github.com/constantAmateur/SoupX). To preprocess the corrected matrices, the Seurat (v.4.1.1) NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and RunUMAP functions were used to create a Seurat object for each sample (https://github.com/satijalab/seurat). The number of principal components used to construct a shared nearest-neighbor graph were chosen to account for 95% of the total variance. To detect possible doublets, we used the package DoubletFinder (v.2.0.3) with inputs specific to each Seurat object. DoubletFinder creates artificial doublets and calculates the proportion of artificial k nearest neighbors (pANN) for each cell from a merged dataset of the artificial and actual data. To maximize DoubletFinder’s predictive power, mean-variance normalized bimodality coefficient (BCMVN) was used to determine the optimal pK value for each dataset. To establish a threshold for pANN values to distinguish between singlets and doublets, the estimated multiplet rates for each sample were calculated by interpolating between the target cell recovery values according to the 10x Chromium user manual. Homotypic doublets were identified using unannotated Seurat clusters in each dataset with the modelHomotypic function. After doublets were identified, all distal and proximal samples were merged separately. Cells with greater than 30% mitochondrial genes, cells with fewer than 750 nCount RNA, and cells with fewer than 200 nFeature RNA were removed from the merged datasets. To correct for any batch defects between sample runs, we used the harmony (v.0.1.0) integration method (github.com/immunogenomics/harmony).
Clustering parameters and annotations After merging the datasets and batch-correction, the dimensions reflecting 95% of the total variance were input into Seurat’s FindNeighbors function with a k.param of 70. Louvain clustering was then conducted using Seurat’s FindClusters with a resolution of 0.7. The resulting 19 clusters were annotated based on the expression of canonical genes and the results of differential gene expression (Wilcoxon Rank Sum test) analysis. One cluster expressing lymphatic and epithelial markers was omitted from later analysis as it only contained 2 cells suspected to be doublets. To better understand the epithelial populations, we reclustered 6 epithelial populations and reapplied harmony batch correction. The clustering parameters from FindNeighbors was a k.param of 50, and a resolution of 0.7 was used for FindClusters. The resulting 9 clusters within the epithelial subset were further annotated using differential expression analysis and canonical markers.
Pseudotime analysis Potential of heat diffusion for affinity-based transition embedding (PHATE) is dimensional reduction method to more accurately visualize continual progressions found in biological data 35. A modified version of Seurat (v4.1.1) was developed to include the ‘RunPHATE’ function for converting a Seurat Object to a PHATE embedding. This was built on the phateR package (v.1.0.7) (https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again). In addition to PHATE, pseudotime values were calculated with Monocle3 (v.1.2.7), which computes trajectories with an origin set by the user 36,55–57. The origin was set to be a progenitor cell state confirmed with lineage tracing experiments. 35. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019). doi:10.1038/s41587-019-0336-3 36. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). doi:10.1038/s41586-019-0969-x 55. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 32, 381–386 (2014). doi:10.1038/nbt.2859 56. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nature Methods 14, 309–315 (2017). doi:10.1038/nmeth.4150 57. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982 (2017). doi:10.1038/nmeth.4402
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.
Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).
Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.
Preprocessing of single cell RNA sequencing (scRNA-seq) data
Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.
Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq
Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.
Differential gene expression by immune cells
Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.
Analysis of arsenic effect on immune cell gene expression by scRNA-seq.
Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output of FindAllMarkers function in Seurat was printed as a table. (XLSX)
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Hypertrophic cardiomyopathy (HCM) is characterized by thickening of the left ventricular wall, diastolic dysfunction, and fibrosis, and is associated with mutations in genes encoding sarcomere proteins. While in vitro studies have used human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) to study HCM, these models have not examined the multicellular interactions involved in fibrosis. Using engineered cardiac microtissues (CMTs) composed of HCM-causing MYH7-variant hiPSC-CMs and wild-type fibroblasts, we observed cell-cell cross-talk leading to increased collagen deposition, tissue stiffening, and decreased contractility dependent on fibroblast proliferation. hiPSC-CM conditioned media and single-nucleus RNA sequencing data suggested that fibroblast proliferation is mediated by paracrine signals from MYH7-variant cardiomyocytes. Furthermore, inhibiting epidermal growth factor receptor tyrosine kinase with erlotinib hydrochloride attenuated stromal activation. Last, HCM-causing MYBPC3-variant CMTs also demonstrated increased stromal activation and reduced contractility, but with distinct characteristics. Together, these findings establish a paracrine-mediated cross-talk potentially responsible for fibrotic changes observed in HCM.
Methods
snRNA-sequencing data
snRNA-seq dataset includes data on CMTs made with healthy wild type (WT) or hypertrophic cardiomyopathy (HCM)-causing (R403Q+/- mutation in myosin heavy chain) human induced pluripotent stem cell derived-cardiomyocytes (hiPSC-CMs) (derived from PGP1 hiPSC line) and ventricular cardiac fibroblasts (Lonza Cat. CC-2904) (1).
Methods: A total of 60,000 cells per tissue (ten tissues pooled per sample), consisting of 90% hiPSC-CMs and 10% vCFs, were mixed in an ECM solution consisting of 4 mg/ml of human fibrinogen (Sigma), 10% Matrigel (Corning), 0.4 unit of thrombin (Sigma) per mg of fibrinogen, 5 µM Y-27632 (Tocris), and 0.033 mg/mL aprotinin (Sigma). The cell-ECM mixture was pipetted into each tissue well, and after gel polymerization for ten minutes, tissue maintenance growth media containing high glucose DMEM (Fisher) supplemented with 10% FBS (Sigma), 1% penicillin-streptomycin (P/S) (Fisher), 1% Non-essential Amino Acids (Fisher), 1% Glutamax (Fisher), 5 µM Y-27632, 0.033 mg/mL aprotinin, and 150 µg/mL L-ascorbic acid 2-phosphate sesquimagnesium salt hydrate. Y-27632 was removed two days following seeding, and the growth media was replaced every other day.
CMTs were flash frozen in liquid nitrogen on day 7 (10 tissues pooled per sample) and stored in -80°C. Individual nuclei were isolated from frozen tissue samples. Briefly, nuclei were isolated (2) and RNA was reverse-transcribed and converted into cDNA libraries using a 10x Chromium Controller and Chromium Single Cell 3' v3.1 reagent kit (10x Genomics). Bar-coded libraries were pooled and sequenced (Illumina NovaSeq 6000). Single nucleus RNAseq alignment and gene counts were performed using Cell Ranger 1.2 (10x Genomics) and Seurat (3) and R 4.1.0 and managed via RStudio. There were 15,129 total sequenced nuclei.
Principal component (PC) analysis was performed to determine the dimensionality of this dataset within Seurat and the number of PCs was selected using both permutation-based and heuristic methods. Cluster assignment was performed in an unsupervised manner using a shared nearest neighbor approach within Seurat. To validate the ideal number of PCs and clustering resolution in this dataset, repeat analysis was performed while varying both the dimensionality and resolution. Using the Clustree R package (4), the hierarchical evolution of cluster identity was determined. This analysis allows the user to investigate how clustering resolution impacts cell identity, in an effort to assign robust classification to cell clusters which remain consistent across resolution values. For each clustering resolution, marker genes for each cluster, TTN for cardiomyocytes and FN1 for fibroblasts, were identified in order to assign relevant biological function to various subclusters. Dimensionality and resolution were adjusted until each identified cluster contained a non-zero amount of significantly expressed marker genes that were able to assign functional gene ontological terms; in this case, at cluster resolution 0.4, which was used for clustering analysis. At greater resolution values, cluster subsets returned either too few genes to assign functional classifications or non-coding genes of unknown function. Dotplot projections of single cells were generated with UMAP coordinates, using the number of dimensions determined from PCA as described above. For the purposes of this analysis, 30 dimensions were considered for both the initial PCA, and subsequent UMAP visualization. To assign cell clusters specific identities, canonical marker gene expression was analyzed to assign broad cell classes. To assign cell subclusters, such as CM1-CM6, unbiased genetic markers for each population were calculated using the Wilcoxon Rank Sum test within Seurat, at which point at most the top 100 genes were used as a Gene Ontology query to assign cellular functions to each cell subcluster.
Genes that were upregulated R403Q+/- cardiomyocytes compared to WT cardiomyocytes with a p-value < 0.01 and fold-change greater than 1.3 were input into Enrichr (5,6). The top upregulated pathways associated with the top upregulated genes were obtained using WikiPathways (7).
Bulk RNA-sequencing data
Bulk RNA-sequencing data include data on serum starved vCFs, vCFs treated with 100 ng/mL recombinant human epidermal growth factor (rhEGF), and conditioned media from hiPSC-CMs (1).
Methods: vCFs were trypsinized, centrifuged, and flash-frozen after two days in serum-starvation (low glucose DMEM with 0.1% fetal bovine serum) vs 100 ng/mL rhEGF or conditioned media from wild-type vs R403Q+/- hypertrophic cardiomyopathy hiPSC-CMs for three different batches. Cells were homogenized in Trizol Reagent (Life Technologies, Inc., Grand Island, NY) with TissueLyzer II (QIAGEN, Inc., Valencia, CA) and RNA was isolated by conventional methods. RNA went through two rounds of mRNA purification (polyA-selection) using Dynabeads mRNA DIRECT Kit (Invitrogen, Carlsbad, CA). Double-stranded cDNA was generated using the Superscript III First-Strand Synthesis System (Invitrogen, Carlsbad, CA). The cDNA products were used to construct libraries with the Nextera XT DNA Sample Preparation Kit (Illumina, Inc., San Diego, CA). Libraries were paired-end sequenced with a read length of 75 base pairs (75PE) on the Illumina NextSeq 500. Reads were aligned to the hg38 Human Genome using Spliced Transcripts Alignment to a Reference (STAR) (8). Data were analyzed as previously described (9) and normalized to the total number of reads per kilobase of exon per million (RPKM).
Genes that were upregulated in rhEGF-treated vCFs compared to serum-starved vCFs with a p-value < 0.01 and fold-change greater than 1.3 were input into Enrichr (5,6). For conditioned media experiments, gene expression values were averaged over three experimental repeats. Genes that had an average fold-change expression greater than 1.15 with a p-value < 0.01 for all three experiments were input into Enrichr. The top upregulated pathways and transcription factors associated with the top upregulated genes were obtained using WikiPathways (7) and ChEA (10) databases, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCell death caused by neutrophil extracellular traps (NETs) is known as NETosis. Despite the increasing importance of NETosis in cancer diagnosis and treatment, its role in Non-Small-Cell Lung Cancer (NSCLC) remains unclear.MethodsA total of 3298 NSCLC patients from different cohorts were included. The AUCell method was used to compute cells’ NETosis scores from single-cell RNA-sequencing data. DEGs in sc-RNA dataset were obtained by the Seurat’s “FindAllMarkers” function, and DEGs in bulk-RNA dataset were acquired by the DESeq2 package. ConsensusClusterPlus package was used to group patients into different NETosis subtypes, and the Enet algorithm was used to construct the NETosis-Related Riskscore (NETRS). Enrichment analyses were conducted using the GSVA and ClusterProfiler packages. Six distinct algorithms were utilized to evaluate patients’ immune cell infiltration level. Patients’ SNV and CNV data were analyzed by maftools and GISTIC2.0, respectively. Drug information was obtained from the GDSC1, and predicted by the Oncopredict package. Patient response to immunotherapy was evaluated by the TIDE algorithm in conjunction with the phs000452 immunotherapy cohort. Six NRGs’ differential expression was verified using qRT-PCR and immunohistochemistry.ResultsAmong all cell types, neutrophils had the highest AUCell score. By Intersecting the DEGs between high and low NETosis classes, DEGs between normal and LUAD tissues, and prognostic related genes, 61 prognostic related NRGs were identified. Based on the 61 NRGs, all LUAD patients can be divided into two clusters, showing different prognostic and TME characteristics. Enet regression identified the NETRS composed of 18 NRGs. NETRS significantly associated with LUAD patients’ clinical characteristics, and patients at different NETRS groups showed significant differences on prognosis, TME characteristics, immune-related molecules’ expression levels, gene mutation frequencies, response to immunotherapy, and drug sensitivity. Besides, NETRS was more powerful than 20 published gene signatures in predicting LUAD patients’ survival. Nine independent cohorts confirmed that NETRS is also valuable in predicting the prognosis of all NSCLC patients. Finally, six NRGs’ expression was confirmed using three independent datasets, qRT-PCR and immunohistochemistry.ConclusionNETRS can serves as a valuable prognostic indicator for patients with NSCLC, providing insights into the tumor microenvironment and predicting the response to cancer therapy.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using