Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.
The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:
Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix
scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix
Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data
Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data
These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.
Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.
The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9
"svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We acquired 10x Visium spatial transcriptomics (ST) data from 9 patients with invasive adenocarcinomas [1–5] to explore the role of the tumour microenvironment (TME) on intratumor heterogeneity (ITH) and drug response in breast cancer. By leveraging a new version of Beyondcell 6, a tool for identifying tumour cell subpopulations with distinct drug response patterns, we predicted sensitivity to over 1,200 drugs while accounting for the spatial context and interaction between the tumour and TME compartments. Moreover, we also used Beyondcell to compute spot-wise functional enrichment scores and identify niche-specific biological functions.
Here, you can find:
In signatures folder:
SSc breast: Collection of gene signatures used to predict sensitivity to > 1,200 drugs derived from breast cancer cell lines.
Functional signatures: Collection of gene signatures used to compute enrichment in different biological pathways.
In visium folder:
Visium objects: Processed ST Seurat objects with deconvoluted spots, SCTransform-normalised counts, and clonal composition predicted with SCEVAN [7]. These objects, together with the signatures, were used to compute the Beyondcell objects.
In single-cell folder:
Single-cell objects: Raw and filtered merged single-cell RNA-seq (scRNA-seq) Seurat objects with unnormalised counts used as a reference for spot deconvolution.
In beyondcell folder:
Beyondcell sensitivity objects with prediction scores for all drug response signatures in SSc breast.
Beyondcell functional objects with enrichment scores for all functional signatures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.
https://ega-archive.org/dacs/EGAC00001000581https://ega-archive.org/dacs/EGAC00001000581
Fastq files from spatial transcriptomic of breast cancer coming from 8 Breast cancer sections. Sample preparation: frozen BC samples were chosen based on tissue structure and RNA quality (RIN > 8). The “Visium Spatial Tissue Optimization Slide and Reagent Kit” (10X Genomics; #PN-1000193) was then used to optimize permeabilization conditions for BC tissues. Briefly, sections were fixed, stained and then permeabilized at different time points to capture mRNA, and the reverse transcription was performed to generate fluorescently labeled cDNA. The permeabilization time that resulted in the highest fluorescence signal with the lowest background diffusion was chosen. The best permeabilization time for BC tissue was 18 min. Cryostat sections of 10 μm of thickness were cut and placed on Visium Spatial Gene Expression slides (10X Genomics, PN-1000184). The slide was incubated for 1 min at 37°C, then fixed with methanol for 30 min at -20°C followed by Hematoxylin and Eosin (H&E) staining and images were taken under a high-resolution microscope. After imaging, the coverslip was detached by holding the slide in water and the slide was mounted in a plastic slide cassette. The spatial gene expression process, including tissue permeabilization, second strand synthesis and cDNA amplification, was performed according to the manufacturer’s instructions (10X Genomics; #CG000239). cDNA quality was next assessed using Agilent High sensitivity DNA Kit (Agilent, #5067-4626). The spatial gene libraries were constructed using Visium Spatial Library Construction Kit (10X Genomics, PN-1000184).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TitleDatasets for high resolution spatial mapping for SC2SpaAuthor(s)Linbu LiaoCategoriesBioinformatic methods developmentItem typedatasetKeyword(s)Spatial transcriptomics, Spatial inferenceFile descriptionAdataMH1.h5ad is the processed mouse hippocampus spatial transcriptomics data file of puck_200115_08 from Slide-seqV2 paper[1].AMB_HC.h5ad is a processed mouse hippocampus scRNA-seq data file[2]. The datasets are saved in Anndata format.HC1_transfer_to_AMB.csv includes the predicted location of the scRNA-seq data (AMB_HC.h5ad). The columns "ClosestSC" and "Dis2ClosestBead" are used in the cell communication analysis tutorial.ssHippo_RCTD.csv is the annotation for the AdataMH1.h5ad file by RCTD[3].WDs_T2.csv includes the Wasserstain distance of genes between the scRNA-seq dataset[2] and the mouse hippocampus Slide-seqV2[1] dataset.SI_T2_WD.h5 is the traned model for mapping mouse hippocampus cells to space. The model is trained using genes selected according to Wasserstain distance.SI_T2.h5 is the spatial inference model trained on all shared genes between two datasets.T2_stat.csv is a summary of SI_T2.h5. It contains genes' contribution to location prediction and Pearson's correlation between prediction and true gene expression.AdataMH2.h5ad is the processed mouse hippocampus spatial transcriptomics data file of puck_191204_01 from the Slide-seqV2 paper[1].slideSeq_Puck190926_03_RCTD.csv is the annotation file for AdataEmbryo1.h5ad. AdataEmbryo1.h5ad is preprocessed file of puck_190926_03 (a mouse emrbyo Slide-seqV2[1] dataset).C2L.zip is the cell2location[4] data used in the analysis of SC2Spa manuscript.Cell2Location_ST.h5ad is the processed Visium adult mouse brain spatial transcriptomics data ST8059048 from Cell2Location paper[4].Cell2Location_snRNAseq.h5ad is the processed snRNA-seq adult mouse brain data of 5705STDY8058280, 5705STDY8058281, 5705STDY8058282, 5705STDY8058283, 5705STDY8058284, 5705STDY8058285 from Cell2Location paper[4]. 80, 81, 82 are from male 1 (female). 83, 84, 85 are from mouse 2 (male)ModelCell2Location_snRNAseq2ST_SC2Spa_WD.h5 is is the pretrained SC2Spa model for mapping snRNAseq to ST (Genes with a Wasserstein Distance greater than 0.1 was used for training)WDs_snRNAseq2Visium.csv has the Wasserstain distance information of genes between the snRNA-seq data[4] and the Visium mouse brain data[4].The datasets were used for the spatial mapping of SC2Spa.CV_code.zip contains the benchmarking code along with scripts for cross-validation and cross-dataset validation.RepositoriesThe github website of SC2Spa: https://github.com/linbuliao/SC2SpaThe github repository for SC2Spa analysis: https://github.com/linbuliao/SC2Spa_NotebooksDocumentationThe Read the Docs website of SC2Spa: https://sc2spa.readthedocs.io/en/latest/References[1] Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella DJ, Arlotta P, Macosko EZ, Chen F: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nature Biotechnology 2020.[2] Saunders A, Macosko EZ, Wysoker A, Goldman M, Krienen FM, de Rivera H, Bien E, Baum M, Bortolin L, Wang SY, et al: Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Cell 2018, 174:1015-+.[3] Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA: Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 2021.[4] Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, Elmentaite R, Lomakin A, Kedlian V, Gayoso A, et al: Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol 2022, 40:661-671.
The samples in the dataset are connected to a study focusing on studying breast cancer intratumoral heterogeneity using spatial transcriptomic data and computational pathology. The dataset contains 14 samples from 3 patients (one triple negative breast cancer and two HER2-positive breast cancer). Multiple regions of the tumor were collected for analysis. Each sample is one tumor region from one of the patients.
Libraries for spatial transcriptomics were prepared using Visium spatial gene expression kits (10x genomics). Sequencing was performed using the Illumina NovaSeq 6000 platform at the National Genomics Infrastructure, SciLifeLab in Solna, Sweden.
The dataset contains 28 fastq files, compressed with GNUzip (gzip), from paired-end RNA sequencing (10X Visium spatial transcriptomics). The meta data is described in SND_metadata.xlsx file. The md5sum.txt file is provided for validation of data integrity. The total size of the dataset is approximately 300 GB.
https://ega-archive.org/dacs/EGAC00001003452https://ega-archive.org/dacs/EGAC00001003452
Visium spatial transcriptomics (10X Genomics) performed on 4 CCA samples. Each sample has two paired-end sequencing runs: the first (I1 & I2) are a pair reading indexes; the second (R1 & R2) are a pair reading inserts, with R1 additionally reading 10X barcodes. For histology images, please contact authors.
Dataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"
Structure
ST_berghei_liver
contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.
STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver
visium_berghei_liver
contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.
V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection
V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection
V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection
se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver
snSeq_berghei_liver
contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.
cellranger_cnt_out contains feature count matrix information from cell ranger output
final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz
raw images.zip contains raw images for supplementary figures 20-22
adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The thymus is a critical organ for T cell development and immune system function, but it undergoes significant structural and functional changes during aging, a process known as thymic involution. To investigate the spatial and cellular changes associated with thymic aging, we performed single-cell spatial transcriptomics on thymic tissues from young (4-week-old) and aged (52-week-old) C57BL/6J mice, with three biological replicates per age group. This dataset provides a detailed spatial and cellular map of the thymus at single-cell resolution, capturing changes in cell types, abundances, and spatial organization during aging. The results offer valuable insights into the cellular and spatial heterogeneity of the thymus and provide a resource for understanding immune system aging and potential therapeutic strategies.
Supplementary data supporting the SpatialOne: End-to-End Analysis of Spatial Transcriptomics at Scale publication
To showcase the capabilities of SpatialOne, two human lung cancer formalin-fixed, paraffin-embedded (FFPE) samples are analyzed. These samples are prepared following the CG000495 protocol (Figure 1b), sequenced with the 10x Visium CytAssist, and processed using the 10x SpaceRanger version 2. We also present analysis of two adult mouse samples sequenced using 10x Visium samples (one fresh frozen brain tissue section processed using SpaceRanger v2 and one FFPE kidney sample processed using the SpaceRanger v1), and 75 internal samples.
For the human lung cancer samples, single-cell data from the the Lung Cancer Atlas (Salcher et al., 2022) is used as reference. This dataset is filtered to include only Chromium-generated data. For the mice samples, the GSE107585 single-cell dataset serves as reference. In the human lung cancer datasets, a pathologist annotated regions of interest corresponding to tumors, blood vessels, and alveolar regions.
Changelog:
Added a README file describing the zip content.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains annotated sub-cellular localised spatial measurements from the Visium, Xenium and CosMx platforms. Specifically, it includes datasets analysed in the publication Bhuva et. al, 2023 titled "Library size confounds biology in spatial transcriptomics data". Raw transcript detections are presented. Data is best accessed through the accompanying SubcellularSpatialData R/Bioconductor package. Region files used to annotate individual transcript detections are presented in the form of GeoJSON files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NanoString GeoMx Digital Spatial Profiler data from 12 paired tumor resections of 6 IDH-mutant astrocytoma patients. All samples had an IDH-R132H mutation. all first resections were WHO 2016 grade II or III and second resections were WHO 2016 grade IV. The NanoString Cancer Transcriptome Atlas panel was used to measure RNA expression levels of ~1800 genes in 72 regions of interest. ROIs folder contains all images of the regions of interest used in this study, separated by tumor pairs. For information on methods see associated publication.
https://www.kuleuven.be/rdm/en/rdr/custom-kuleuvenhttps://www.kuleuven.be/rdm/en/rdr/custom-kuleuven
This folder contains the fastq-files that are generated during the Grand Challenge project using 10X Genomics Visium on head&neck squamous cell carcinoma samples. It contains 4 fastq-files (R1 and R2 for each of the two sequencing lanes) per patient (for each patient, 2 samples (biopsy and resection) were collected, and the two samples of 1 patient (HNI40020) was analyzed twice).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 5: RMarkdown on how to use the RegionNeighbours function.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains spatial transcriptomics data related to the Wu et al. 2021 study "A single-cell and spatially resolved atlas of human breast cancers". Processed count matrices, brightfield HE-images (plain and annotated) and meta-data (containing clinical information and spot pathological details) for 6 primary breast cancers profiled using the Visium assay (10X Genomics). If you use this dataset in your research, please consider citing the above study.
The content of the files are:
raw_count_matrices.tar.gz - spaceranger processed raw count matrices.
spatial.tar.gz - spaceranger processed spatial files (images, scalefactors, aligned fiducials, position lists)
filtered_count_matrices.tar.gz - filtered count matrices.
metadata.tar.gz - metadata for tissues and spots of filtered count matrices, including clinical subtype and pathological annotation of each spot.
images.pdf - pdf detailing the H&E and annotation images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent breakthroughs in spatial transcriptomics technologies have enhanced our understanding of diverse cellular identities, spatial organizations, and functions. Yet existing spatial transcriptomics tools are still limited in either transcriptomic coverage or spatial resolution, hindering unbiased, hypothesis-free transcriptomic analyses at high spatial resolution. Here we develop Reverse-padlock Amplicon Encoding FISH (RAEFISH), an image-based spatial transcriptomics method with whole-genome coverage and single-molecule resolution in intact tissues. We demonstrate spatial profiling of 23,000 human or 22,000 mouse transcripts in single cells and tissue sections. Our analyses reveal transcript-specific subcellular localization, cell-type-specific and cell-type-invariant zonation-dependent transcriptomes, and gene programs underlying preferential cell-cell interactions. Finally, we further develop our technology for direct spatial readout of gRNAs in an image-based high-content CRISPR screen. Overall, these developments provide the research community with a broadly applicable technology that enables high-coverage, high-resolution spatial profiling of both long and short, native and engineered RNA species in many biomedical contexts.
Single-cell RNA sequencing (scRNA-seq) has advanced our understanding of cell types and their heterogeneity within the human liver, but the spatial organization at single-cell resolution has not yet been described. Here we apply multiplexed error robust fluorescent in situ hybridization (MERFISH) to map the zonal distribution of hepatocytes, resolve subsets of macrophage and mesenchymal populations, and investigate the relationship between hepatocyte ploidy and gene expression within the healthy human liver. We next integrated spatial information from MERFISH with the more complete transcriptome produced by single- nucleus RNA sequencing (snRNA-seq), revealing zonally enriched receptor-ligand interactions. Finally, analysis of fibrotic liver samples identified two hepatocyte populations that are not restricted to zonal distribution and expand with injury. Together these spatial maps of the healthy and fibrotic liver provide a deeper understanding of the cellular and spatial remodeling t..., Two measurement modalities were used to generate these data, including multiplexed error robust fluorescence in situ hybridization (MERFISH) and single-nucleus RNA sequencing (snRNAseq)., , # MERFISH and snRNAseq data from Watson, Paul et al
This README file contains information on the data deposited for the manuscript "Spatial transcriptomics of healthy and fibrotic human liver at single-cell resolution" by Watson, Paul and colleagues.
Multiple anndata structures are provide as h5ad files for different datasets. These anndata structures were generated with the scanpy pipeline (v1.8.1) and can be loaded in python with the associated tools. These include: (1) adata_healthy_merfish.h5ad (2) adata_healthy_diseased_merfish.h5ad (3) adata_healthy_merfish_nucseq.h5ad (4) adata_healthy_nucseq.h5ad
Each anndata frame contains distinctive values for the respective data set as follows:
(1) adata_healthy_merfish.h5ad This structure contains data from healthy patient samples which were imaged with MERFISH. Raw data is stored in the adata.raw.X while adata.X is normalized by the total counts per cell, scaled to a uniform value, and then converted to logarithm...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data for training the GAN (Inversion) model and reproduce the results reported in the paper
Xenium platform was used for the spatial transcriptomic analysis of human DRG neurons, 100 marker genes were selected as the customized probe panel and hybridized to fresh frozen hDRG sections. Manual segmentation of each neuron soma was performed, based on expressions of pan-neuronal marker gene PGP9.5, satellite glia cell marker FAB7B, and the corresponding H.E. staining. In total, 1340 neurons were identified (excluding 75 region-of-interest with poor or unclear neuronal soma morphology in H & E staining) and clustered into 16 groups. The 16 clusters were assigned as different cell types based on marker genes expression., In the study presented here, four dorsoal root ganglia tissues from two healthy donors were used for Xenium spatial transcriptomics analysis, A hundred gene panel (including 87 neuronal genes from our single-soma sequencing dataset and 13 non-neuronal cell marker genes) were selected to perform spatial transcriptomics. The spatial distribution of these genes in neurons and non-neuronal cells was successfully profiled and quantified., , # Spatial transcriptomic analysis of human dorsoal root ganglia neurons
This dataset is associated with Yu & Nagi 2024 (https://doi.org/10.1038/s41593-024-01794-1). It contains human dorsal root ganglia (DRG) 10x Xenium spatial transcriptomics raw data. In total, four DRG tissue sections from two healthy donors were used for Xenium spatial transcriptomics analysis, A hundred gene panel (including 87 neuronal genes from our single-soma sequencing dataset and 13 non-neuronal cell marker genes) were selected to perform spatial transcriptomics. The spatial distribution of these genes in neurons and non-neuronal cells was successfully profiled and quantified.
Overview: The .rar file contains all of the 10x Xenium spatial transcriptomics raw data for data analysis generating plots in the associated manuscript. Each .rar file contains the following contents.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.
The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:
Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix
scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix
Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data
Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data
These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.
Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.
The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9
"svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.