50 datasets found
  1. Data, R code and output Seurat Objects for single cell RNA-seq analysis of...

    • figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yunshun Chen; Gordon Smyth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.

  2. n

    Data from: Large-scale integration of single-cell transcriptomic data...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Cornell University
    Authors
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

    Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

    Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

    Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

    Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

    Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

    Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

    Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

    Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

    Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

  3. f

    Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    figshare
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  4. n

    Transcription start site analysis for heterogenous CD4+ T cells using 5′...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akiko Oguchi; Yasuhiro Murakawa (2024). Transcription start site analysis for heterogenous CD4+ T cells using 5′ scRNA-seq [Dataset]. http://doi.org/10.5061/dryad.gtht76hv9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    RIKEN Center for Integrative Medical Sciences
    Authors
    Akiko Oguchi; Yasuhiro Murakawa
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    These datasets are generated by ReapTEC (read-level pre-filtering and transcribed enhancer call) using 5' single-cell RNA-seq data on human heterogenous CD4+ T cells. By taking advantage of a unique “cap signature” derived from the 5′-end of a transcript, ReapTEC simultaneously profiles gene expression and enhancer activity at nucleotide resolution using 5′-end single-cell RNA-sequencing (5′ scRNA-seq). The detail of ReapTEC pipeline is described in https://github.com/MurakawaLab/ReapTEC.

  5. f

    ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1

    • figshare.com
    application/gzip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1 [Dataset]. http://doi.org/10.6084/m9.figshare.12478571.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    figshare
    Authors
    Massimo Andreatta; Santiago Carmona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).

  6. Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...

    • figshare.com
    application/gzip
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihao Qu; Peggy Myung (2024). Processed Seurat objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25243225.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rihao Qu; Peggy Myung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.

  7. Data from: Single cell multiomic analysis identifies key genes...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Jul 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Kaushik; Kari Nadeau (2024). Single cell multiomic analysis identifies key genes differentially expressed in innate lymphoid cells from COVID-19 patients [Dataset]. http://doi.org/10.5061/dryad.8931zcrz4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    Authors
    Abhinav Kaushik; Kari Nadeau
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Innate lymphoid cells (ILCs) are enriched at mucosal surfaces where they respond rapidly to environmental stimuli and contribute to both tissue inflammation and healing. To gain insight into the role of ILCs in the pathology and recovery from COVID-19 infection, we employed a multi-omic approach consisting of Abseq and targeted mRNA sequencing to respectively probe the surface marker expression, transcriptional profile and heterogeneity of ILCs in peripheral blood of patients with COVID-19 compared with healthy controls. We found that the frequency of ILC1 and ILC2 cells was significantly increased in COVID-19 patients. Moreover, all ILC subsets displayed a significantly higher frequency of CD69-expressing cells, indicating a heightened state of activation. ILC2s from COVID-19 patients had the highest number of significantly differentially expressed (DE) genes. The most notable genes DE in COVID-19 vs healthy participants included a) genes associated with responses to virus infections and b) genes that support ILC self-proliferation, activation and homeostasis. In addition, differential gene regulatory network analysis revealed ILC-specific regulons and their interactions driving the differential gene expression in each ILC. Overall, this study provides mechanistic insights into the characteristics of ILC subsets activated during COVID-19 infection. Methods Study participants, blood draws and processing Participants were recruited as described previously from adults who had a positive SARS-COV-2 RT-PCR test at Stanford Health Care (NCT04373148). Collection of Covid samples occurred between May to December 2020. The cohort used in this study consisted of asymptomatic (n=2), mild (n=17), and moderate (n=3) COVID-19 infections, some of whom developed long term COVID-19 (n=15). The clinical case severities at the time of diagnosis were defined as asymptomatic, moderate or mild according to the guidelines released by NIH. Long term (LT) COVID was defined as symptoms occurring 30 or more days after infection, consistent with CDC guidelines. Some participants in our study continued to have LT COVID symptoms 90 days after diagnosis (n=12). Exclusion criteria for COVID sample study were NIH severity diagnosis of severe or critical at the time of positive covid test. Samples selected for this study were obtained within 76 days of positive PCR COVID-19 test date. Healthy controls were selected who had sample collection before 2020. Informed consent was obtained from all participants. All protocols were approved by the Stanford Administrative Panel on Human Subjects in Medical Research. Peripheral blood was drawn by venipuncture and using validated and published procedures, peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-based density gradient centrifugation, frozen in aliquots and stored in liquid nitrogen at -80°C , until thawing. A summary of participant demographics is presented in Supp. Table 1.
    ILC Enrichment, single cell captures for Abseq and targeted mRNAseq Participant PBMCs were thawed, and each sample stained with Sample Tag (BD #633781) at room temperature for 20 minutes. Samples were combined in healthy control or COVID-19 tubes. Cells were surface stained with a panel of fluorochrome-conjugated antibodies (Supp. Table 2) in buffer (PBS with 0.25% BSA and 1mM EDTA) for 20 minutes at room temperature prior to immunomagnetic negative selection for ILCs. Following ILC enrichment using the EasySep human Pan-ILC enrichment kit (StemCell Technologies #17975), cells from healthy and COVID-19 recovered participants were counted and normalized before combining. ILCs were sorted using a BD FACS Aria at the Stanford FACS facility prior to incubation with AbSeq oligo-linked mAbs (Supp. Table 3). Sorted cells were processed by the Stanford Human Immune Monitoring Center (HIMC) using the BD Rhapsody platform. Library was prepared using the BD Immune Response Targeting Panel (BD Kit #633750) with addition of custom gene panel reagents (Supp. Table 4) and sequenced on Illumina NovaSeq 6000 at Stanford Genomics Sequencing Center (SGSC). ILCs were identified as Lineageneg (CD3neg, CD14neg, CD34neg, CD19neg), NKG2Aneg, CD45+ and ILCs further defined as CD127+CD161+ and as subsets: ILC1 (CD117negCRTH2neg), ILC2 (CRTH2+) and ILCp (CD117+CRTH2neg) (Supp. Fig. 1). Computational data analysis The above multi-modal setup allowed paired measurements of cellular transcriptome and cell surface protein abundance. The ILC1, ILC2 and ILCp cells were manually gated based on the abundance profile of CD127, CD117, CD161 and CRTH2 (Supp. Fig. 1). Before the integrative analysis, the complete multi-modal single cell dataset containing ILC subsets was converted into single Seurat object. All the subsequent protein-level and gene-level analyses were performed using multimodal data analysis pipeline of Seurat R package version 4.0. The normalized and scaled protein abundance profile was used for estimating the integrated harmony dimensions using runHarmony function in Seurat R package (reduction= ‘apca’ and group.by.vars = ‘batch’) . The batch corrected harmony embeddings were then used for computing the Uniform Manifold Approximation and Projection (UMAP) dimensions to visualize the clusters of ILC subsets. Differential marker analysis of surface proteins, between two groups of cells (COVID-19 and Healthy cohort), from abseq panels was computed with normalized and scaled expression values using FindMarkers function from Seurat R package (test.use=’wilcox’). Similarly, differential gene expression was performed on normalized and scaled gene expression values from between two groups of cells (COVID-19 and Healthy cohort) using the FindMarkers function from Seurat R package (test.use=’MAST’ and latent.vars=’batch’). Genes with log-fold change > 0.5 and adjusted p-value < 0.05 (method: Benjamini-Hochberg) (were considered as significant for further evaluation. The resulting adjusted p-values box-plots were plotted using ggplot2 R package (version 3.4.2) after computing the number of cells expressing a given protein or gene in each sample. Pathway enrichment analysis of DE genes was performed using web-server metascape (version 3.5). The AUCells score and gene regulatory network analysis was performed using pySCENIC pipeline (version 0.12.1). Gene regulatory network was reconstructed using GRNBoost2 algorithm and the list of TFs in humans (genome version: hg38) were obtained from cisTarget database. (https://resources.aertslab.org/cistarget). Cellular enrichment (aka AUCell) analysis that measures the activity of TF or gene signatures across all single cells was performed using aucell function in pySCENIC python library. The ggplot2 R package (version 3.4.2) was used for boxplot visualization. The differential gene co-expression analysis was performed using scSFMnet R package. Circular plots were generated using the R package circlize (version 0.4.15).

  8. Systematic reconstruction of molecular pathway signatures using scalable...

    • zenodo.org
    bin, pdf, txt, zip
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Longda Jiang; Longda Jiang; Carol Dalgarno; Carol Dalgarno; Efthymia Papalexi; Efthymia Papalexi; Isabella Mascio; Isabella Mascio; Hans-Hermann Wessels; Hans-Hermann Wessels; Huiyoung Yun; Huiyoung Yun; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson; Rahul Satija; Rahul Satija; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson (2025). Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens [Dataset]. http://doi.org/10.5281/zenodo.14518762
    Explore at:
    pdf, bin, zip, txtAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Longda Jiang; Longda Jiang; Carol Dalgarno; Carol Dalgarno; Efthymia Papalexi; Efthymia Papalexi; Isabella Mascio; Isabella Mascio; Hans-Hermann Wessels; Hans-Hermann Wessels; Huiyoung Yun; Huiyoung Yun; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson; Rahul Satija; Rahul Satija; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repo contains Seurat objects, differential expression analysis results, and pathway gene lists for the manuscript "Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens"
    List of files:

    1. Seurat_object_IFNB_Perturb_seq.rds: Seurat object of the Perturb-seq data for Interferon-beta pathway
    2. Seurat_object_IFNG_Perturb_seq.rds: Seurat object of the Perturb-seq data for Interferon-gamma pathway
    3. Seurat_object_TNFA_Perturb_seq.rds: Seurat object of the Perturb-seq data for TNF-alpha pathway
    4. Seurat_object_TGFB1_Perturb_seq.rds: Seurat object of the Perturb-seq data for TGF-beta1 pathway
    5. Seurat_object_INS_Perturb_seq.rds: Seurat object of the Perturb-seq data for insulin pathway
    6. Pathway_genelist.rds: The pathway gene lists from MultiCCA analysis
    7. Pathway_Exclusive_genelist.rds: The pathway exclusive gene lists generated from Pathway_genelist.rds
    8. HClust_Pathway_celltype_specific_genelist.rds: The cell-line specific pathway gene lists from hierarchical clustering analysis independently done on each cell line
    9. DE_results_all_pathway.zip: The DE test results for all the regulators, cell lines, and pathways (from Mixscale weighted DE test.)
    10. Bulk_RNAseq_Seurat_object_IFNG_and_TGFB_stim.rds: Seurat object for the bulk RNA-seq data for interferon-gamma and TGF-beta stimulation experiments
    11. Parse_Guide_Capture_Protocol.pdf: The guide RNA capture protocol developed for Parse Evercode Whole Transcriptome kit

  9. Lung ECs scRNA-seq: Gene Expression and Metadata

    • zenodo.org
    application/gzip
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Engelbrecht; Eric Engelbrecht (2024). Lung ECs scRNA-seq: Gene Expression and Metadata [Dataset]. http://doi.org/10.5281/zenodo.14004479
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eric Engelbrecht; Eric Engelbrecht
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Normalized gene expression and cell metadata derived from a Seurat object.

  10. n

    Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Jun 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow (2020). Single cell RNA-seq analysis reveals that prenatal arsenic exposure results in long-term, adverse effects on immune gene expression in response to Influenza A infection [Dataset]. http://doi.org/10.5061/dryad.vt4b8gtp6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2020
    Dataset provided by
    Dartmouth–Hitchcock Medical Center
    Dartmouth College
    Authors
    Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.

    Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).

    Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.

    Preprocessing of single cell RNA sequencing (scRNA-seq) data

    Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.

    Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq

    Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.

    Differential gene expression by immune cells

    Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.

    Analysis of arsenic effect on immune cell gene expression by scRNA-seq.

    Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.

  11. f

    Skin sc-RNASeq from seven body sites (face, scalp, axilla, palmoplantar,...

    • plus.figshare.com
    bin
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lam C Tsoi; Rachael Bogle; Johann Gudjonsson; Meri Oliva; Bridget Riley-Gillis (2025). Skin sc-RNASeq from seven body sites (face, scalp, axilla, palmoplantar, arm, leg, and back) [Dataset]. http://doi.org/10.25452/figshare.plus.25696620.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    Figshare+
    Authors
    Lam C Tsoi; Rachael Bogle; Johann Gudjonsson; Meri Oliva; Bridget Riley-Gillis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This sc-RNAseq dataset is composed of disease-unaffected epidermal samples from 96 skin biopsies: 18 from published datasets - GSE173706, GSE249279 – and 78 newly generated ones. Biopsy sample and protocol details, and curated cell-type signature genes, are available in the scRNASeq_source_info_FigShare spreadsheet of this dataset. Processed Seurat object are provided herein. Raw data are available in SRA (id PRJNA1054546). Biopsies originated from seven body sites (face, scalp, axilla, palmoplantar, arm, leg, and back). The skin biopsies were separated into epidermis and dermis before dissociated and enriched for various cell fractions (keratinocytes, fibroblasts, and endothelial cells) and immune cells (myeloid and lymphoid cells) to up sample rare cell types. In total, across body sites, 274,834 cells were profiled, including 96,194 keratinocytes. Seurat v3.0. was utilized to normalize, scale, and reduce the dimensionality of the data. Low quality cells containing less than 200 genes per cell as well as greater than 5,000 genes per cell were filtered out. Cells containing more mitochondrial genes than the permitted quantile of 0.05 were removed. Ambient RNA was removed using R package SoupX v1.6.2. Doublets were removed using scDblFinder v1.12.0. Principal components (PC) were obtained from the topmost 2,000 variable genes, and the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique was applied to the 30 topmost variable PC-reduced dataset. Batch effect correction was performed utilizing harmony v1.0, using donor as batch. After batch correction, cells were clustered using shared nearest neighbor modularity optimization-based clustering. Cluster marker genes were identified with FindAllMarkers; cluster corresponding cell type was identified by comparing marker genes to curated cell-type signature genes. Differential expression by keratinocyte subtype was performed with Seurat (v4.3.0) FindMarkers function by comparing keratinocyte subtype to non-keratinocyte clusters. The log fold-change of the average expression between a keratinocyte subtype cluster compared to the rest of clusters is utilized as keratinocyte-subtype gene expression statistic.

  12. Data and Code from: Dysregulation of zebrin-II cell subtypes is a shared...

    • zenodo.org
    bin
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke C. Bartelt; Luke C. Bartelt; Craig B. Lowe; Albert R. La Spada; Craig B. Lowe; Albert R. La Spada (2024). Data and Code from: Dysregulation of zebrin-II cell subtypes is a shared feature across polyglutamine ataxia mouse models and human patients [Dataset]. http://doi.org/10.5281/zenodo.13905956
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luke C. Bartelt; Luke C. Bartelt; Craig B. Lowe; Albert R. La Spada; Craig B. Lowe; Albert R. La Spada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 6, 2024
    Description

    Abstract

    Spinocerebellar ataxia type 7 (SCA7) is a genetic neurodegenerative disorder caused by a CAG- polyglutamine repeat expansion. Purkinje cells (PCs) are central to the pathology of ataxias, but their low abundance in the cerebellum underrepresents their transcriptomes in sequencing assays. To address this issue, we developed a PC enrichment protocol and sequenced individual nuclei from mice and patients with SCA7. Single-nucleus RNA sequencing in SCA7-266Q mice revealed dysregulation of cell identity genes affecting glia and PCs. Specifically, genes marking zebrin-II PC subtypes accounted for the highest proportion of DEGs in symptomatic SCA7-266Q mice. These transcriptomic changes in SCA7-266Q mice were associated with increased numbers of inhibitory synapses as quantified by immunohistochemistry and reduced spiking of PCs in acute brain slices. Dysregulation of zebrin-II cell subtypes was the predominant signal in PCs of SCA7-266Q mice and was associated with the loss of zebrin-II striping in the cerebellum at motor symptom onset. We furthermore demonstrated zebrin-II stripe degradation in additional mouse models of polyglutamine ataxia and observed decreased zebrin-II expression in cerebellum of patients with SCA7. Our results suggest that a breakdown of zebrin subtype regulation is a shared pathological feature of polyglutamine ataxias.

    Data and Code Availability

    Here you will find data and code associated with our manuscript "Dysregulation of zebrin-II cell subtypes is a shared feature across polyglutamine ataxia mouse models and human patients", Bartelt et al., Sci. Trans. Med. 16, eadn5449 (2024).

    The data file labeled "HuCb_filtered.rds" is a processed and annotated single-nucleus RNA-seq Seurat object, containing the gene-level count data for the multiplexed snRNA-seq experiment performed on post-mortem human cerebellar tissues from patients with SCA7 and unaffected controls. Data obtained from WT and SCA7-266Q mice as described in our paper can be accessed in the NIH Gene Expression Omnibus under accession number GSE269430.

    There are three code files numbered 00 through 02 which contain analysis code for snRNA-seq data applied to both the mouse and human datasets. These files are sequential and will take the user from CellRanger output, to filtered and annotated Seurat objects, and include details for subclustering analysis as well as our pseudobulk DEseq2 differential expression approach. There are places where the user may need to modify the code based on their computer system, version of R or Seurat, and whether they are processing the 5 week, 8 week, or human data sets; these locations in the code are marked with comments.

    • The first file, 00_Preprocessing_MULTIseq, begins with CellRanger filtered_feature_barcode_matrix output, extracts cell barcodes, utilizes the MULTIseq deMULTIplex software to match cell barcodes to oligo barcodes from MULTIseq fastq files, and annotates the Seurat file with metadata. Cell type identification and annotation also takes place in this file. Note: the deMULTIplex step will likely need to be run on a high performance compute cluster.
    • The second file, 01_Seurat_Analysis, uses the filtered and annotated Seurat file to calculate useful QC metrics, investigate disease signals, and perform cell type subclustering analyses.
    • The third file, 02_Pseudobulk_DEseq2, contains custom analysis code to extract raw counts for each cell type and each animal from the Seurat file, and uses the DEseq2 package to calculate DEGs, taking into account biological replicates, and raw read count differences between control and SCA7 animals.
  13. u

    Spatial Transcriptomics of chicken pectoralis major muscle

    • agdatacommons.nal.usda.gov
    bin
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Delaware (2025). Spatial Transcriptomics of chicken pectoralis major muscle [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Spatial_Transcriptomics_of_chicken_pectoralis_major_muscle/25078415
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    National Center for Biotechnology Information
    Authors
    University of Delaware
    License

    https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/

    Description

    This study aims to use spatial transcriptomics to characterize the cell-type-specific expression profile associated with the microscopic features observed in Wooden Breast myopathy. 1 cm3 muscle sample was dissected from the cranial part of the right pectoralis major muscle from three randomly sampled broiler chickens at 23 days post-hatch and processed with Visium Spatial Gene Expression kits (10X Genomics), followed by high-resolution imaging and sequencing on the Illumina Nextseq 2000 system. WB classification was based on histopathologic features identified. Sequence reads were aligned to the chicken reference genome (Galgal6) and mapped to histological images. Unsupervised K-means clustering and Seurat integrative analysis differentiated histologic features and their specific gene expression pattern, including lipid laden macrophages (LLM), unaffected myofibers, myositis and vasculature. In particular, LLM exhibited reprogramming of lipid metabolism with up-regulated lipid transporters and genes in peroxisome proliferator-activated receptors pathway, possibly through P. Moreover, overexpression of fatty acid binding protein 5 could enhance fatty acid uptake in adjacent veins. In myositic regions, increased expression of cathepsins may play a role in muscle homeostasis and repair by mediating lysosomal activity and apoptosis. A better knowledge of different cell-type interactions at early stages of WB is essential in developing a comprehensive understanding.

  14. Robject files for tissues processed by Seurat

    • figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tabula Muris Consortium (2023). Robject files for tissues processed by Seurat [Dataset]. http://doi.org/10.6084/m9.figshare.5821263.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tabula Muris Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each tissue's gene expression profile was processed by experts to annotate clusters of cells with biological functions. These are the Robjects created using Seurat to normalize and cluster the single-cell RNA-seq expression data.Update 2018-03-27: Updated to resubmitted RobjUpdate 2018-09-20: Updated to accepted Robj

  15. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Stoop, Allart
    Hsu, Jonathan
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  16. Data for Cell-type-specific alternative splicing in the cerebral cortex of a...

    • zenodo.org
    application/gzip
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma F. Jones; Emma F. Jones; Timothy C. Howton; Timothy C. Howton; Tabea M. Soelter; Tabea M. Soelter; Anthony B. Crumley; Anthony B. Crumley; Brittany N. Lasseigne; Brittany N. Lasseigne (2024). Data for Cell-type-specific alternative splicing in the cerebral cortex of a Schinzel-Giedion Syndrome patient variant mouse model [Dataset]. http://doi.org/10.5281/zenodo.12535061
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emma F. Jones; Emma F. Jones; Timothy C. Howton; Timothy C. Howton; Tabea M. Soelter; Tabea M. Soelter; Anthony B. Crumley; Anthony B. Crumley; Brittany N. Lasseigne; Brittany N. Lasseigne
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    data.tar.gz contains all files from the data directory (except for sam outputs from STAR) associated with the 230926_EJ_Setbp1_AlternativeSplicing GitHub project and includes the following files:

    ./marvel: - This directory contains rds and Rdata objects that were created using the MARVEL R package

    cell_type_goresults.rds - This is the go results split by cell type

    marvel_04_split_counts.Rdata - This R data includes all environment objects from MARVEL script 04, and is used for downstream plotting

    normalized_sj_expression.Rds - This object is the normalized splice junction expression

    Setbp1_marvel_aligned.rds - Final prepared MARVEL object before any SJU analyses have been run

    significant_tables.RData - For those who do not want to load multiple massive files, this includes all significant SJU results for each cell type

    sj_usage_cell_type.rds - This data object has splice junction usage calculated for each cell type

    sj_usage_condition.rds - This data object has splice junction usage calculated for each cell type and also split by condition

    ./seurat: - This directory contains all intermediate and final Seurat single-cell gene expression objects

    annotated_brain_samples.rds - This is the final iteration of the processing in Seurat for a final annotated object. Please use this object for any Seurat or single-cell gene expression analyses.

    clustered_brain_samples.rds - This is the clustered Seurat object, before cell type annotation based on canonical markers.

    filtered_brain_samples_pca.rds - This is the filtered Seurat object, before clustering but after PCA.

    filtered_brain_samples.rds - This is the filtered Seurat object, before PCA.

    integrated_brain_samples.rds - This the integrated Seurat object, before other steps.

    ./star: - All files in the STAR directory are outputs from STARsolo, as described in our methods. Each output directory contains the same files, so only one example is included here for brevity. Intermediate SAM files were removed to optimize space.

    J1/ - This directory contains outputs for brain sample J1

    J13/ - This directory contains outputs for brain sample J13

    J15/ - This directory contains outputs for brain sample J15

    J2/ - This directory contains outputs for brain sample J2

    J3/ - This directory contains outputs for brain sample J3

    J4/ - This directory contains outputs for brain sample J4

    K1/ - This directory contains outputs for kidney sample K1

    K2/ - This directory contains outputs for kidney sample K2

    K3/ - This directory contains outputs for kidney sample K3

    K4/ - This directory contains outputs for kidney sample K4

    K5/ - This directory contains outputs for kidney sample K5

    K6/ - This directory contains outputs for kidney sample K6

    ./star/genome: - This directory contains outputs from running STAR genomeGenerate. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

    chrLength.txt

    chrNameLength.txt

    chrName.txt

    chrStart.txt

    exonGeTrInfo.tab

    exonInfo.tab

    geneInfo.tab

    Genome

    genomeParameters.txt

    Log.out

    SA

    SAindex

    sjdbInfo.txt

    sjdbList.fromGTF.out.tab

    sjdbList.out.tab

    transcriptInfo.tab

    ./star/J1: - This is the head STAR directory for sample J1. It contains logs, basic QC, and gene and splice junction counts. For more information about the STAR pipeline and its outputs, please refer to the STAR documentation https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

    Log.final.out

    Log.out

    Log.progress.out

    SJ.out.tab

    Solo.out/

    STARgenome/

    ./star/J1/Solo.out:- This directory contains the outputs used for downstream analysis

    Barcodes.stats

    GeneFull_Ex50pAS/

    SJ/

    ./star/J1/Solo.out/GeneFull_Ex50pAS: - This directory contains the filtered and raw barcodes, features, and matrix files for gene expression (including introns)

    Features.stats

    filtered/

    raw/

    Summary.csv

    UMIperCellSorted.txt

    ./star/J1/Solo.out/GeneFull_Ex50pAS/filtered: - This directory contains the filtered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages)

    barcodes.tsv.gz - This file contains filtered cell barcodes

    features.tsv.gz - This file contains filtered features (genes)

    matrix.mtx.gz - This file contains the filtered cell by gene expression count matrix

    ./star/J1/Solo.out/GeneFull_Ex50pAS/raw: - This directory contains the unfiltered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages). Files are the same as previously described for filtered.

    barcodes.tsv

    features.tsv

    matrix.mtx

    ./star/J1/Solo.out/SJ: - This directory contains the QC and raw barcodes, features, and matrix files for splice junction expression

    Features.stats

    raw/

    Summary.csv

    ./star/J1/Solo.out/SJ/raw: - This directory contains the raw barcodes, features, and matrix files for splice junction expression

    barcodes.tsv - This file contains filtered cell barcodes

    features.tsv - This file contains filtered features (splice junctions)

    matrix.mtx - This file contains the filtered cell by gene expression count matrix

    ./star/J1/_STARgenome: - This directory contains the STARgenome created and used by STAR for this sample. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

    exonGeTrInfo.tab

    exonInfo.tab

    geneInfo.tab

    sjdbInfo.txt

    sjdbList.fromGTF.out.tab

    sjdbList.out.tab

    transcriptInfo.tab

  17. n

    scRNA data from: Organization of the human Intestine at single cell...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Winston Becker (2023). scRNA data from: Organization of the human Intestine at single cell resolution [Dataset]. http://doi.org/10.5061/dryad.8pk0p2ns8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Stanford University
    Authors
    Winston Becker
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The human adult intestinal system is a complex organ that is approximately 9 meters long and performs a variety of complex functions including digestion, nutrient absorption, and immune surveillance. We performed snRNA-seq on 8 regions of of the human intestine (duodenum, proximal-jejunum, mid-jejunum, ileum, ascending colon, transverse colon, descending colon, and sigmoid colon) from 9 donors (B001, B004, B005, B006, B008, B009, B010, B011, and B012). In the corresponding paper, we find cell compositions differ dramatically across regions of the intestine and demonstrate the complexity of epithelial subtypes. We map gene regulatory differences in these cells suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation, and organization in the human intestine, and serve as an important reference map for understanding human biology and disease. Methods For a detailed description of each of the steps to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were flash frozen. Nuclei were isolated from each sample and the resulting nuclei were processed with either 10x scRNA-seq using Chromium Next GEM Single Cell 3’ Reagent Kits v3.1 (10x Genomics, 1000121) or Chromium Next GEM Chip G Single Cell Kits (10x Genomics, 1000120) or 10x multiome sequencing using Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kits (10x Genomics, 1000283). Initial processing of snRNA-seq data was done with the Cell Ranger Pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) by first running cellranger mkfastq to demultiplex the bcl files and then running cellranger count. Since nuclear RNA was sequenced, data were aligned to a pre-mRNA reference. Initial processing of the mutiome data, including alignment and generation of fragments files and expression matrices, was performed with the Cell Ranger ARC Pipeline. The raw expression matrices from these pipelines are included here. Downstream processing was performed in R, using the Seurat package.

  18. E

    Data from: Disease specific alterations in the olfactory mucosa of patients...

    • ega-archive.org
    Updated Feb 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Disease specific alterations in the olfactory mucosa of patients with Alzheimer’s disease [Dataset]. https://www.ega-archive.org/datasets/EGAD00001008560
    Explore at:
    Dataset updated
    Feb 9, 2022
    License

    https://ega-archive.org/dacs/EGAC00001002527https://ega-archive.org/dacs/EGAC00001002527

    Description

    The sample AD_Library_1, AD_Library_2 and Control_Library were run on a Chromium Chip B with the Chromium Single Cell 3′ Library & Gel Bead Kit v3 kit (10x Genomics, CA, USA) . The 3’ gene expression libraries were sequenced at an approximate depth of 50,000 reads per cell using the NovaSeq 6000 S1 (Illumina, San Diego, CA, USA) flow cells. Cell Ranger v.3.0.2 was used to analyze the raw base call files. FASTQ files and raw gene-barcode matrices were generated and aligned human genome GRCh37 (hg19). The samples were integrated in R v.4.0.3 and generated Seurat objects, two related to AD samples and one to control samples, were analyzed using the Seurat package v.4.0.3 to perform downstream analysis, clustering of the cells and differential expression.

  19. u

    Dawnn benchmarking dataset: Heart cells processing and label simulation

    • rdr.ucl.ac.uk
    txt
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Heart cells processing and label simulation [Dataset]. http://doi.org/10.5522/04/22601260.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 4, 2023
    Dataset provided by
    University College London
    Authors
    George Hall; Sergi Castellano Hereza
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on single-cell RNAseq of heart cells.

    FILES: Input data Dataset from: "Integrated multi-omic characterization of congenital heart disease". Nature 608 pp. 181-191 (2022).

    heart_barcodes.tsv.gz Cell barcode list heart_genes.tsv.gz Gene list heart_expression_matrix.mtx.gz Cell-by-gene expression matrix

    Data processing code

    process_heart_cells.R Generates benchmarking dataset from input data. (Reads heart_barcodes.tsv.gz, heart_genes.tsv.gz, and heart_expression_matrix.mtx.gz; Runs the standard Seurat pipeline; Saves the resulting Seurat dataset as heart_tissue_cells.RDS and the resulting cell labels as benchmark_dataset_heart_data_type_labels.csv)

    Resulting datasets

    heart_tissue_cells.RDS Seurat dataset generated by process_heart_cells.R. benchmark_dataset_heart_data_type_labels.csv Cell labels generated by process_heart_cells.R.

  20. n

    Data from: Extraocular muscle stem cells exhibit distinct cellular...

    • data.niaid.nih.gov
    • dataone.org
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai (2024). Extraocular muscle stem cells exhibit distinct cellular properties associated with non-muscle molecular signatures [Dataset]. http://doi.org/10.5061/dryad.b8gtht7k0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Institut Pasteur
    Délégation Ile-de-France Ouest et Nord
    Authors
    Daniela Di Girolamo; Maria Benavente-Diaz; Melania Murolo; Alexandre Grimaldi; Priscilla Thomas Lopes; Brendan Evano; Mao Kuriki; Stamatia Gioftsidi; Vincent Laville; Jean-Yves Tinevez; Gaëlle Letort; Sebastien Mella; Shahragim Tajbakhsh; Glenda Comai
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The muscle stem cell (MuSC) population is recognized as functionally heterogeneous. Cranial muscle stem cells, which originate from head mesoderm, can have greater proliferative capacity in culture and higher regenerative potential in transplantation assays when compared to those in the limb. The existence of such functional differences in phenotypic outputs remain unresolved as a comprehensive understanding of the underlying mechanisms is lacking. We addressed this issue using a combination of clonal analysis, live imaging, and scRNA-seq, identifying critical biological features that distinguish extraocular (EOM) and limb (Tibialis anterior, TA) MuSC populations. Time-lapse studies using a MyogenintdTomato reporter showed that the increased proliferation capacity of EOM MuSCs is accompanied by a differentiation delay in vitro. Unexpectedly, in vitro activated EOM MuSCs expressed a large array of distinct extracellular matrix (ECM) components, growth factors, and signaling molecules that are typically associated with mesenchymal non-muscle cells. These unique features are regulated by a specific set of transcription factors that constitute a coregulating module. This transcription factor network, which includes Foxc1 as one of the major players, appears to be hardwired to EOM identity as it is present in quiescent adult MuSCs, in the activated counterparts during growth and retained upon passages in vitro. These findings provide insights into how high-performing MuSCs regulate myogenic commitment by active remodeling of their local environment. Methods

    scRNAseq data generation MuSCs were isolated on BD FACSAriaTM III based on GFP fluorescence and cell viability from Tg:Pax7- nGFP mice (Sambasivan et al., 2009). Quiescent MuSCs were manually counted using a hemocytometer and immediately processed for scRNA-seq. For activated samples, MuSCs were cultured in vitro as described above for four days. Activated MuSCs were subsequently trypsinized and washed in DMEM/F12 2% FBS. Live cells were re-sorted, manually counted using a hemocytometer and processed for scRNA-seq. Prior to scRNAseq, RNA integrity was assessed using Agilent Bioanalyzer 2100 to validate the isolation protocol (RIN>8 was considered acceptable). 10X Genomics Chromium microfluidic chips were loaded with around 9000 cells and cDNA libraries were generated following manufacturer’s protocol. Concentrations and fragment sizes were determined using Agilent Bioanalyzer and Invitrogen Qubit. cDNA libraries were sequenced using NextSeq 500 and High Output v2.5 (75 cycles) kits. Count matrices were subsequently generated following 10X Genomics Cell Ranger pipeline. Following normalisation and quality control, we obtained an average of 5792 ± 1415 cells/condition. Seurat preprocessing scRNAseq datasets were processed using Seurat (https://satijalab.org/seurat/) (Butler et al., 2018). Cells with more than 10% of mitochondrial gene fraction were discarded. 4000-5000 genes were detected on average across all 4 datasets. Dimensionality reduction and UMAPs were generated following Seurat workflow. The top 100 DEGs were determined using Seurat "FindAllMarkers" function with default parameters. When processed independently (scvelo), the datasets were first regressed on cell cycle genes, mitochondrial fraction, number of genes, number of UMI following Seurat dedicated vignette, and doublets were removed using DoubletFinder v3 (McGinnis et al., 2019). A "StressIndex" score was generated for each cell based on the list of stress genes previously reported (Machado et al., 2021) using the “AddModule” Seurat function. 94 out of 98 genes were detected in the combined datasets. UMAPs were generated after 1. StressIndex regression, and 2. after complete removal of the detected stress genes from the gene expression matrix before normalization. In both cases, the overall aspect of the UMAP did not change significantly (Figure S5). Although immeasurable confounding effects of cell stress following isolation cannot be ruled out, we reasoned that our datasets did not show a significant effect of stress with respect to the conclusions of our study. Matrisome analysis After subsetting for the features of the Matrisome database (Naba et al., 2015) present in our single-cell dataset, the matrisome score was calculated by assessing the overall expression of its constituents using the "AddModuleScore" function from Seurat (Butler et al., 2018).

    RNA velocity and driver genes Scvelo was used to calculate RNA velocities (Bergen et al., 2020). Unspliced and spliced transcript matrices were generated using velocyto (Manno et al., 2018) command line function. Seurat-generated filtering, annotations and cell-embeddings (UMAP, tSNE, PCA) were then added to the outputted objects. These datasets were then processed following scvelo online guide and documentation. Velocity was calculated based on the dynamical model (using scv.tl.recover_dynamics(adata), and scv.tl.velocity(adata, mode=’dynamical’)) and differential kinetics calculations were added to the model (using scv.tl.velocity(adata, diff_kinetics=True)). Specific driver genes were identified by determining the top likelihood genes in the selected cluster. The lists of the top 100 drivers for EOM and TA progenitors are given in Suppl Tables 10 and 11. Gene regulatory network inference and transcription factor modules Gene regulatory networks were inferred using pySCENIC (Aibar et al., 2017; Sande et al., 2020). This algorithm regroups sets of correlated genes into regulons (i.e. a transcription factor and its targets) based on binding motifs and co-expression patterns. The top 35 regulons for each cluster were determined using scanpy "scanpy.tl.rank_genes_groups" function (method=t-test). Note that this function can yield less than 35 results depending on the cluster. UMAP and heatmap were generated using regulon AUC matrix (Area Under Curve) which refers to the activity level of each regulon in a given cell. Visualizations were performed using scanpy (Wolf et al., 2018). The outputted list of each regulon and their targets was subsequently used to create a transcription factor network. To do so, only genes that are regulons themselves were kept. This results in a visual representation where each node is an active transcription factor and each edge is an inferred regulation between 2 transcription factors. When placed in a force-directed environment, these nodes aggregate based on the number of shared edges. This operation greatly reduced the number of genes involved, while highlighting co-regulating transcriptional modules. Visualization of this network was performed in a force-directed graph using Gephi “Force-Atlas2” algorithm (https://gephi.org/).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1
Organization logo

Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues

Explore at:
application/gzipAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yunshun Chen; Gordon Smyth
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.

Search
Clear search
Close search
Google apps
Main menu