6 datasets found
  1. n

    Data from: Large-scale integration of single-cell transcriptomic data...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +2more
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Cornell University
    Authors
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

    Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

    Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

    Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

    Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

    Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

    Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

    Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

    Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

    Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

  2. A Single-cell Transcriptomic Sequencing Dataset of Early Female and Male...

    • figshare.com
    txt
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhi Cao; Kai Jin (2025). A Single-cell Transcriptomic Sequencing Dataset of Early Female and Male Chicken (Gallus gallus) Embryos [Dataset]. http://doi.org/10.6084/m9.figshare.28357844.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zhi Cao; Kai Jin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quality Control of Single-Cell DataRaw sequencing data were processed using SCOPE-tools (v1.4.0) to generate a gene expression matrix. After extracting and correcting barcodes and unique molecular identifiers (UMIs), adapter sequences and poly(A) tails were removed. The trimmed reads were aligned to the chicken reference genome (GRCg6a) using the integrated STAR (v2.7.9a) algorithm in CellRanger (v5.0.0). Gene mapping was performed with featureCounts, followed by UMI correction and quantification to produce a complete gene expression matrix. The processed data were then compiled into a matrix file. The expression matrix was further analyzed using the Seurat (v4.3.0.1) package to ensure data quality. Cells were filtered based on gene count thresholds (min.cells > 3 and min.features > 200). Cells with fewer than 1,000 UMIs or a log10GenesPerUMI value exceeding 0.7 were excluded. Additionally, cells with mitochondrial gene content exceeding 25% were removed. These quality control measures ensured the reliability of downstream analyses.Dimensionality Reduction and Clustering of Single-Cell DataTo reduce technical noise and ensure high data quality, the gene expression matrix was normalized and scaled using the NormalizeData and ScaleData functions in the Seurat package. The FindVariableFeatures function was applied to calculate the mean expression and dispersion for each gene, identifying 2,000 highly variable genes. Principal component analysis (PCA) was then performed on the high-dimensional data, retaining the top 20 principal components. Simulated doublet data were generated to match the expected doublet rate, and these were integrated with the original dataset. Each cell was assigned a doublet score using a k-nearest neighbor (k-NN) classifier. Potential doublets were identified using the doubletFinder_v3 function with the parameter pN = 0.25 and removed based on the expected doublet threshold, resulting in a final dataset of 70,361 high-quality cells for downstream analyses. To correct for potential batch effects, the Harmony algorithm was applied. For clustering, the FindClusters function was used with a resolution of 0.4, followed by dimensionality reduction and visualization using uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding. The UMAP algorithm was optimized with a neighborhood size of 20 to achieve optimal cell clustering and clear visual representation of the cell populations.Differential Gene Screening To characterize the functional properties of different cell clusters, we identified differentially expressed genes (DEGs) using the "FindAllMarkers" function in the Seurat package. The selection criteria required genes to be expressed in more than 25% of cells in the target cell subpopulation (min.pct = 0.25) and to exhibit significantly higher expression levels in the target cluster compared to others (test.use = "MAST"). To ensure the biological relevance of the results, more stringent thresholds were applied: p-value < 0.05 and |log2 fold change| > 1. Cell types were annotated by integrating literature-supported evidence and classical marker genes, allowing for accurate classification of cell populations and elucidation of their biological functions. The expression patterns of marker genes were visualized using the DoHeatmap, DotPlot, and VlnPlot functions in the Seurat package. These visualizations further clarified cell identities and highlighted their functional characteristics.

  3. n

    Data from: Human tau mutations in cerebral organoids induce a progressive...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +3more
    zip
    Updated Jan 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stella M.K. Glasauer; Susan K. Goderie; Jennifer N. Rauch; Elmer Guzman; Morgane Audouard; Taylor Bertucci; Shona Joy; Emma Rommelfanger; Gabriel Luna; Erica Keane-Rivera; Steven Lotz; Susan Borden; Aaron M. Armando; Oswald Quehenberger; Sally Temple; Kenneth S. Kosik (2023). Human tau mutations in cerebral organoids induce a progressive dyshomeostasis of cholesterol [Dataset]. http://doi.org/10.25349/D95898
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 30, 2023
    Dataset provided by
    Neural Stem Cell Institute
    University of California, Santa Barbara
    University of California, San Diego
    Authors
    Stella M.K. Glasauer; Susan K. Goderie; Jennifer N. Rauch; Elmer Guzman; Morgane Audouard; Taylor Bertucci; Shona Joy; Emma Rommelfanger; Gabriel Luna; Erica Keane-Rivera; Steven Lotz; Susan Borden; Aaron M. Armando; Oswald Quehenberger; Sally Temple; Kenneth S. Kosik
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Single cell RNA sequencing (drop-seq) data of forebrain organoids carrying pathogenic MAPT R406W and V337M mutations. Organoids were generated from 5 heterozygous donor lines (two R406W lines and three V337M lines) and respective CRISPR-corrected isogenic controls. Organoids were also generated from one homozygous R406W donor line. Single-cell sequencing was performed at 1, 2, 3, 4, 6 and 8 months of organoid maturation. Methods Single-cell transcriptomes were obtained using drop-seq (Macosko et al., 2015, https://doi.org/10.1016/j.cell.2015.05.002). Counts matrices were generated using the Drop-seq tools package (Macosko et al. 2015), with full details available online (https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf). Briefly, raw reads were converted to BAM files, cell barcodes and UMIs were extracted, and low-quality reads were removed. Adapter sequences and polyA tails were trimmed, and reads were converted to Fastq for STAR alignment (STAR version 2.6). Mapping to human genome (hg19 build) was performed with default settings. Reads mapped to exons were kept and tagged with gene names, beads synthesis errors were corrected, and a digital gene expression matrix was extracted from the aligned library. We extracted data from twice as many cell barcodes as the number of cells targeted (NUM_CORE_BARCODES = 2x # targeted cells). Downstream analysis was performed using Seurat 3.0 in R version 3.6.3. An individual Seurat object was generated for each sample, and filtered and clustered individually. Cells with < 300 genes detected were filtered out, as were cells with > 10% mitochondrial gene content. Counts data were log-normalized using the default NormalizeData function and the default scale of 1e4. Then, the top 2000 variable genes were identified using the Seurat FindVariableFeatures function (selection.method = “vst”, nfeatures = 2000), followed by scaling and centering using the default ScaleData function. Principal Components Analysis was carried out on the scaled expression values of the 2000 top variable genes, and the cells were clustered using the first 50 principal components (PCs) as input in the FindNeighbors function, and a resolution of 0.4 in the FindClusters function. Non-linear dimensionality reduction was performed by running UMAP on the first 50 PCs. Following clustering and dimensionality reduction, putative cell doublets were identified using DoubletFinder (McGinnis et al. 2019; https://doi.org/10.1016/j.cels.2019.03.003), assuming a doublet formation rate of 5%. For each sample, the optimal pK value was identified based on the results of paramSweep_vs, summarizeSweep and find.pK functions of the DoubletFinder package. Instead of using the default paramSweep_vs function, we extended the upper range of computed pK values to 1.2. We visually verified cells identified as doublets had high nFeatures (number of genes expressed) by plotting the pANN metric against nFeatures. For samples not showing this correlation, we adjusted the pK value to the next highest peak in the pK/BCmetric plot. Finally, the individual Seurat objects were merged.

  4. Data from: Pre-ciliated tubal epithelial cells are prone to initiation of...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coulter Ralston; Alexander Nikitin; Benjamin Cosgrove (2024). Pre-ciliated tubal epithelial cells are prone to initiation of high-grade serous ovarian carcinoma [Dataset]. http://doi.org/10.5061/dryad.4mw6m90hm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 17, 2024
    Dataset provided by
    Cornell University
    Authors
    Coulter Ralston; Alexander Nikitin; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The distal region of the uterine (Fallopian) tube is commonly associated with high-grade serous carcinoma (HGSC), the predominant and most aggressive form of ovarian or extra-uterine cancer. Specific cell states and lineage dynamics of the adult tubal epithelium (TE) remain insufficiently understood, hindering efforts to determine the cell of origin for HGSC. Here, we report a comprehensive census of cell types and states of the mouse uterine tube. We show that distal TE cells expressing the stem/progenitor cell marker Slc1a3 can differentiate into both secretory (Ovgp1+) and ciliated (Fam183b+) cells. Inactivation of Trp53 and Rb1, whose pathways are commonly altered in HGSC, leads to elimination of targeted Slc1a3+ cells by apoptosis, thereby preventing their malignant transformation. In contrast, pre-ciliated cells (Krt5+, Prom1+, Trp73+) remain cancer-prone and give rise to serous tubal intraepithelial carcinomas and overt HGSC. These findings identify transitional pre-ciliated cells as a previously unrecognized cancer-prone cell state and point to pre-ciliation mechanisms as novel diagnostic and therapeutic targets. Methods

    Single-cell RNA-sequencing library preparation For TE single cell expression and transcriptome analysis we isolated TE from C57BL6 adult estrous female mice. In 3 independent experiments a total of 62 uterine tubes were collected. Each uterine tube was placed in sterile PBS containing 100 IU ml-1 of penicillin and 100 µg ml-1 streptomycin (Corning, 30-002-Cl), and separated in distal and proximal regions. Tissues from the same region were combined in a 40 µl drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with 25G needles. Minced tissues were transferred with help of a sterile wide bore 200 µl pipette tip into a 1.8 ml cryo vial containing 1.2 ml A-mTE-D1 (300 IU ml-1 collagenase IV mixed with 100 IU ml-1 hyaluronidase; Stem Cell Technologies, 07912, in DMEM Ham’s F12, Hyclone, SH30023.FS). Tissues were incubated with loose cap for 1 h at 37°C in a 5% CO2 incubator. During the incubation tubes were taken out 4 times and tissues suspended with a wide bore 200 µl pipette tip. At the end of incubation, the tissue-cell suspension from each tube was transferred into 1 ml TrypLE (Invitrogen, 12604013) pre-warmed to 37°C, suspended 70 times with a 1000 µl pipette tip, 5 ml A-SM [DMEM Ham’s F12 containing 2% fetal bovine serum (FBS)] were added to the mix, and TE cells were pelleted by centrifugation 300x g for 10 minutes at 25°C. Pellets were then suspended with 1 ml pre-warmed to 37°C A-mTE-D2 (7 mg ml-1 Dispase II, Worthington NPRO2, and 10 µg ml-1 Deoxyribonuclease I, Stem Cell Technologies, 07900), and mixed 70 times with a 1000 µl pipette tip. 5 ml A-mTE-D2 was added and samples were passed through a 40 µm cell strainer, and pelleted by centrifugation at 300x g for 7 minutes at +4°C. Pellets were suspended in 100 µl microbeads per 107 total cells or fewer, and dead cells were removed with the Dead Cell Removal Kit (Miltenyi Biotec, 130-090-101) according to the manufacturer’s protocol. Pelleted live cell fractions were collected in 1.5 ml low binding centrifuge tubes, kept on ice, and suspended in ice cold 50 µl A-Ri-Buffer (5% FBS, 1% GlutaMAX-I, Invitrogen, 35050-079, 9 µM Y-27632, Millipore, 688000, and 100 IU ml-1 penicillin 100 μg ml-1 streptomycin in DMEM Ham’s F12). Cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000-6,000 were loaded on Chromium controller (10X Genomics, Single Cell 3’ v2 chemistry) to perform single cell partitioning and barcoding using the microfluidic platform device. After preparation of barcoded, next-generation sequencing cDNA libraries samples were sequenced on Illumina NextSeq500 System.

    Download and alignment of single-cell RNA sequencing data For sequence alignment, a custom reference for mm39 was built using the cellranger (v6.1.2, 10x Genomics) mkref function. The mm39.fa soft-masked assembly sequence and the mm39.ncbiRefSeq.gtf (release 109) genome annotation last updated 2020-10-27 were used to form the custom reference. The raw sequencing reads were aligned to the custom reference and quantified using the cellranger count function.

    Preprocessing and batch correction All preprocessing and data analysis was conducted in R (v.4.1.1 (2021-08-10)). The cellranger count outs were first modified with the autoEstCont and adjustCounts functions from SoupX (v.1.6.1) to output a corrected matrix with the ambient RNA signal (soup) removed (https://github.com/constantAmateur/SoupX). To preprocess the corrected matrices, the Seurat (v.4.1.1) NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and RunUMAP functions were used to create a Seurat object for each sample (https://github.com/satijalab/seurat). The number of principal components used to construct a shared nearest-neighbor graph were chosen to account for 95% of the total variance. To detect possible doublets, we used the package DoubletFinder (v.2.0.3) with inputs specific to each Seurat object. DoubletFinder creates artificial doublets and calculates the proportion of artificial k nearest neighbors (pANN) for each cell from a merged dataset of the artificial and actual data. To maximize DoubletFinder’s predictive power, mean-variance normalized bimodality coefficient (BCMVN) was used to determine the optimal pK value for each dataset. To establish a threshold for pANN values to distinguish between singlets and doublets, the estimated multiplet rates for each sample were calculated by interpolating between the target cell recovery values according to the 10x Chromium user manual. Homotypic doublets were identified using unannotated Seurat clusters in each dataset with the modelHomotypic function. After doublets were identified, all distal and proximal samples were merged separately. Cells with greater than 30% mitochondrial genes, cells with fewer than 750 nCount RNA, and cells with fewer than 200 nFeature RNA were removed from the merged datasets. To correct for any batch defects between sample runs, we used the harmony (v.0.1.0) integration method (github.com/immunogenomics/harmony).

    Clustering parameters and annotations After merging the datasets and batch-correction, the dimensions reflecting 95% of the total variance were input into Seurat’s FindNeighbors function with a k.param of 70. Louvain clustering was then conducted using Seurat’s FindClusters with a resolution of 0.7. The resulting 19 clusters were annotated based on the expression of canonical genes and the results of differential gene expression (Wilcoxon Rank Sum test) analysis. One cluster expressing lymphatic and epithelial markers was omitted from later analysis as it only contained 2 cells suspected to be doublets. To better understand the epithelial populations, we reclustered 6 epithelial populations and reapplied harmony batch correction. The clustering parameters from FindNeighbors was a k.param of 50, and a resolution of 0.7 was used for FindClusters. The resulting 9 clusters within the epithelial subset were further annotated using differential expression analysis and canonical markers.

    Pseudotime analysis Potential of heat diffusion for affinity-based transition embedding (PHATE) is dimensional reduction method to more accurately visualize continual progressions found in biological data 35. A modified version of Seurat (v4.1.1) was developed to include the ‘RunPHATE’ function for converting a Seurat Object to a PHATE embedding. This was built on the phateR package (v.1.0.7) (https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again). In addition to PHATE, pseudotime values were calculated with Monocle3 (v.1.2.7), which computes trajectories with an origin set by the user 36,55–57. The origin was set to be a progenitor cell state confirmed with lineage tracing experiments. 35. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019). doi:10.1038/s41587-019-0336-3 36. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). doi:10.1038/s41586-019-0969-x 55. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 32, 381–386 (2014). doi:10.1038/nbt.2859 56. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nature Methods 14, 309–315 (2017). doi:10.1038/nmeth.4150 57. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982 (2017). doi:10.1038/nmeth.4402

  5. transcript counts_GBM T cells_scRNAseq.xlsx

    • figshare.com
    xlsx
    Updated Mar 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tessa Gargett (2022). transcript counts_GBM T cells_scRNAseq.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.19119698.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 15, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tessa Gargett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw transcript counts of GBM T cells from 3 patient GBM tissue samples (BT20, BT23 and BT26).Using R version 3.6.3 and Seurat version 3.1.5, cells with (i) fewer than 200 genes, (ii) gene numbers outside ±2 standard deviations from the mean, or (iii) a mitochondrial gene fraction greater than 10% were excluded. Remaining cells were log normalised by total expression and scaled to 10,000 transcripts/cell with the NormalizeData function in Seurat. Using the FindIntegrationAnchors (dims = 1:30, k.filter = 200) and IntegrateData (dims = 1:30) functions, cells from different libraries were combined by assessing the pairwise correspondence between a set of representative genes (anchors). The FindVariableGenes function was used to identify variable genes returning 2000 features using vst as the selection method. The data were then scaled and principal component analysis applied using the ScaleData and RunPCA functions respectively. Cells were clustered based on gene expression profiles using the FindNeighbors and FindClusters functions with resolution set to 0.5.

  6. Data from: Transcriptomic analysis of skeletal muscle regeneration across...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Walter; Benjamin Cosgrove (2024). Transcriptomic analysis of skeletal muscle regeneration across mouse lifespan identifies altered stem cell states [Dataset]. http://doi.org/10.5061/dryad.kkwh70sbv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    Cornell University
    Authors
    Lauren Walter; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle regeneration relies on the orchestrated interaction of myogenic and non-myogenic cells with spatial and temporal coordination. The regenerative capacity of skeletal muscle declines with aging due to alterations in myogenic stem/progenitor cell states and functions, non-myogenic cell contributions, and systemic changes, all of which accrue with age. A holistic network-level view of the cell-intrinsic and -extrinsic changes influencing muscle stem/progenitor cell contributions to muscle regeneration across the lifespan remains poorly resolved. To provide a comprehensive atlas of regenerative muscle cell states across mouse lifespan, we collected a compendium of 273,923 single-cell transcriptomes from hindlimb muscles of young, old, and geriatric (4-7, 20, and 26 months old, respectively) mice at six closely sampled time-points following myotoxin injury. We identified 29 muscle-resident cell types, eight of which exhibited accelerated or delayed dynamics in their abundances between age groups, including T and NK cells and multiple macrophage subtypes, suggesting that the age-related decline in muscle repair may arise from temporal miscoordination of the inflammatory response. We performed a pseudotime analysis of myogenic cells across the regeneration timespan and found age-specific myogenic stem/progenitor cell trajectories in old and geriatric muscles. Given the critical role that cellular senescence plays in limiting cell contributions in aged tissues, we built a series of tools to bioinformatically identify senescence in these single-cell data and assess their ability to identify senescence within key myogenic stages. By comparing single-cell senescence scores to co-expression of hallmark senescence genes Cdkn2a and Cdkn1a, we found that an experimentally derived gene list derived from a muscle foreign body response (FBR) fibrosis model accurately (receiver-operator curve AUC = 0.82-0.86) identified senescent-like myogenic cells across mouse ages, injury time-points, and cell-cycle states, in a manner comparable to curated gene-lists. Further, this scoring approach in both single-cell and spatial transcriptomic datasets pinpointed transitory senescent-like subsets within the myogenic stem/progenitor cell trajectory that are associated with stalled MuSC self-renewal states across all ages of mice. This new resource on mouse skeletal muscle aging provides a comprehensive portrait of the changing cellular states and interactions underlying skeletal muscle regeneration across the mouse lifespan. Methods Mouse muscle injury and single-cell isolation. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols (approval # 2014-0085), and experiments were performed in compliance with its institutional guidelines. Mice were maintained at 70-73°F on a 14/10-h light/dark with humidity mainly at 40%. Muscle injury was induced in young (4-7 months-old [mo]), old (20 mo), and geriatric (26 mo) C57BL/6J mice (Jackson Laboratory # 000664; NIA Aged Rodent Colonies) by injecting both tibialis anterior (TA) muscles with 10 µl of notexin (10 µg/ml; Latoxan, France). The mice were sacrificed, and TA muscles were collected at 0, 1, 2, 3.5, 5, and 7 days post-injury (dpi) with n = 3-4 biological replicates per sample. Each TA was processed independently to generate single-cell suspensions. At each time point, the young and old samples are biological replicates of TA muscles from distinct mice, and the geriatric samples are biological replicates of two TA muscles from each of the two mice. A mixture of male and female mice was used. See Supplemental Table 1 for additional details. Muscles were digested with 8 mg/ml Collagenase D (Roche, Basel, Switzerland) and 10 U/ml Dispase II (Roche, Basel, Switzerland) and then manually dissociated to generate cell suspensions. Myofiber debris was removed by filtering the cell suspensions through a 100 µm and then a 40 µm filter (Corning Cellgro # 431752 and # 431750). After filtration, erythrocytes were removed by incubating the cell suspension inan erythrocyte lysis buffer (IBI Scientific # 89135-030). Single-cell RNA-sequencing library preparation. After digestion, the single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. A hemocytometer was used to manually count the cells to determine the concentration of the suspension. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, Pleasanton, CA) following the manufacturer’s protocol (10x Genomics: Resolving Biology to Advance Human Health, 2020). Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes with <5% doublet rate. Libraries were sequenced on the NextSeq 500 (Illumina, San Diego, CA) (Illumina | Sequencing and array-based solutions for genetic research, 2020). The sequencing data was aligned to the mouse reference genome (mm10) using CellRanger v5.0.0 (10x Genomics) (10x Genomics: Resolving Biology to Advance Human Health, 2020). Preprocessing single-cell RNA-sequencing data. From the gene expression matrix, the downstream analysis was carried out in R (v3.6.1). First, the ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX) (Young and Behjati, 2020). Samples were then preprocessed using the standard Seurat (v3.2.3) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat) (Stuart et al., 2019). Cells with fewer than 200 genes, with fewer than 750 UMIs, and more than 25% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0.3) was used to identify putative doublets in each dataset (McGinnis, Murrow, and Gartner, 2019). The estimated doublet rate was 5% according to the 10x Chromium handbook. The putative doublets were removed from each dataset. Next, the datasets were merged and then batch-corrected with Harmony (github.com/immunogenomics/harmony) (v1.0) (Korsunsky et al., 2019). Seurat was then used to process the integrated data. Dimensions accounting for 95% of the total variance were used to generate SNN graphs (FindNeighbors) and SNN clustering was performed (FindClusters). A clustering resolution of 0.8 was used resulting in 24 initial clusters. Cell type annotation in single-cell RNA-sequencing data. Cell types were determined by expression of canonical genes. Each of the 24 initial clusters received a unique cell type annotation. The nine myeloid clusters were challenging to differentiate between, so these clusters were subset out (Subset) and re-clustered using a resolution of 0.5 (FindNeighbors, FindClusters) resulting in 15 initial clusters. More specific myeloid cell type annotations were assigned based on the expression of canonical myeloid genes. This did not help to clarify the monocyte and macrophage annotations, but it did help to identify more specific dendritic cell and T cell subtypes. These more specific annotations were transferred from the myeloid subset back to the complete integrated object based on the cell barcode. Analysis of cell type dynamics. We generated a table with the number of cells from each sample (n = 65) in each cell type annotation (n = 29). We removed the erythrocytes from this analysis because they are not a native cell type in skeletal muscle. Next, for each sample, we calculated the percent of cells in each cell type annotation. The mean and standard deviation were calculated from each age and time point for every cell type. The solid line is the mean percentage of the given cell type, the ribbon is the standard deviation around the mean, and the points are the values from individual replicates. We evaluated whether there was a significant difference in the cell type dynamics over all six-time points using non-linear modeling. The dynamics for each cell type were fit to some non-linear equation (e.g., quadratic, cubic, quartic) independent and dependent on age. The type of equation used for each cell type was selected based on the confidence interval and significance (p < 0.05) of the leading coefficient. If the leading coefficient was significantly different from zero, it was concluded that the leading coefficient was needed. If the leading coefficient was not significantly different than zero, it was concluded that the leading coefficient was not needed, and the degree of the equation went down one. No modeling equation went below the second degree. The null hypothesis predicted that the coefficients of the non-linear equation were the same across the age groups while the alternative hypothesis predicted that the coefficients of the non-linear equation were different across the age groups. We conducted a One-Way ANOVA to see if the alternative hypothesis fits the data significantly better than the null hypothesis and we used FDR as the multiple comparison test correction (using the ANOVA and p.adjust (method = fdr) functions in R, respectively). T cell exhaustion scoring. We grouped the three T cell populations (this includes Cd3e+ cycling and non-cycling T cells and Cd4+ T cells) and z-scored all genes. The T cell exhaustion score was calculated using a transfer-learning method developed by Cherry et al 2023 and a T cell exhaustion gene list from Bengsch et al 2018 (Bengsch et al., 2018; Cherry et al., 2023). The Mann-Whitney U-test was performed on the T cell exhaustion score between ages. Senescence scoring. We tested two senescence-scoring methods along with fourteen senescence gene lists (Supplemental Table 2) to identify senescent-like cells within the scRNA-seq dataset. The Two-way Senescence Score (Sen Score) was calculated using a transfer-learning method developed by Cherry et al 2023 (Cherry et al., 2023). With this

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

Search
Clear search
Close search
Google apps
Main menu