The dataset contains an integrated, annotated Seurat v4 object. One can load the dataset into the R environment using the code below:
seurat_obj <- readRDS('PATH/TO/DOWNLOAD/seurat.rds')
The object has three assays: (I) RNA, (II) SCT and (III) integrated.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RDS files containing processed Seurat objects for multiome analysis of neuroblastoma cell lines. File names reflect the cell line.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprocessed and annotated scRNA-seq Seurat object of PBMC5K dataset of human PBMCs from 10X genomics.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
data.tar.gz contains all files from the data directory (except for sam outputs from STAR) associated with the 230926_EJ_Setbp1_AlternativeSplicing GitHub project and includes the following files:
./marvel: - This directory contains rds and Rdata objects that were created using the MARVEL R package
cell_type_goresults.rds - This is the go results split by cell type
marvel_04_split_counts.Rdata - This R data includes all environment objects from MARVEL script 04, and is used for downstream plotting
normalized_sj_expression.Rds - This object is the normalized splice junction expression
Setbp1_marvel_aligned.rds - Final prepared MARVEL object before any SJU analyses have been run
significant_tables.RData - For those who do not want to load multiple massive files, this includes all significant SJU results for each cell type
sj_usage_cell_type.rds - This data object has splice junction usage calculated for each cell type
sj_usage_condition.rds - This data object has splice junction usage calculated for each cell type and also split by condition
./seurat: - This directory contains all intermediate and final Seurat single-cell gene expression objects
annotated_brain_samples.rds - This is the final iteration of the processing in Seurat for a final annotated object. Please use this object for any Seurat or single-cell gene expression analyses.
clustered_brain_samples.rds - This is the clustered Seurat object, before cell type annotation based on canonical markers.
filtered_brain_samples_pca.rds - This is the filtered Seurat object, before clustering but after PCA.
filtered_brain_samples.rds - This is the filtered Seurat object, before PCA.
integrated_brain_samples.rds - This the integrated Seurat object, before other steps.
./star: - All files in the STAR directory are outputs from STARsolo, as described in our methods. Each output directory contains the same files, so only one example is included here for brevity. Intermediate SAM files were removed to optimize space.
J1/ - This directory contains outputs for brain sample J1
J13/ - This directory contains outputs for brain sample J13
J15/ - This directory contains outputs for brain sample J15
J2/ - This directory contains outputs for brain sample J2
J3/ - This directory contains outputs for brain sample J3
J4/ - This directory contains outputs for brain sample J4
K1/ - This directory contains outputs for kidney sample K1
K2/ - This directory contains outputs for kidney sample K2
K3/ - This directory contains outputs for kidney sample K3
K4/ - This directory contains outputs for kidney sample K4
K5/ - This directory contains outputs for kidney sample K5
K6/ - This directory contains outputs for kidney sample K6
./star/genome: - This directory contains outputs from running STAR genomeGenerate. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
Log.out
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
./star/J1: - This is the head STAR directory for sample J1. It contains logs, basic QC, and gene and splice junction counts. For more information about the STAR pipeline and its outputs, please refer to the STAR documentation https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
Log.final.out
Log.out
Log.progress.out
SJ.out.tab
Solo.out/
STARgenome/
./star/J1/Solo.out:- This directory contains the outputs used for downstream analysis
Barcodes.stats
GeneFull_Ex50pAS/
SJ/
./star/J1/Solo.out/GeneFull_Ex50pAS: - This directory contains the filtered and raw barcodes, features, and matrix files for gene expression (including introns)
Features.stats
filtered/
raw/
Summary.csv
UMIperCellSorted.txt
./star/J1/Solo.out/GeneFull_Ex50pAS/filtered: - This directory contains the filtered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages)
barcodes.tsv.gz - This file contains filtered cell barcodes
features.tsv.gz - This file contains filtered features (genes)
matrix.mtx.gz - This file contains the filtered cell by gene expression count matrix
./star/J1/Solo.out/GeneFull_Ex50pAS/raw: - This directory contains the unfiltered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages). Files are the same as previously described for filtered.
barcodes.tsv
features.tsv
matrix.mtx
./star/J1/Solo.out/SJ: - This directory contains the QC and raw barcodes, features, and matrix files for splice junction expression
Features.stats
raw/
Summary.csv
./star/J1/Solo.out/SJ/raw: - This directory contains the raw barcodes, features, and matrix files for splice junction expression
barcodes.tsv - This file contains filtered cell barcodes
features.tsv - This file contains filtered features (splice junctions)
matrix.mtx - This file contains the filtered cell by gene expression count matrix
./star/J1/_STARgenome: - This directory contains the STARgenome created and used by STAR for this sample. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on single-cell RNAseq of heart cells. FILES: Input data Dataset from: "Integrated multi-omic characterization of congenital heart disease". Nature 608 pp. 181-191 (2022). heart_barcodes.tsv.gz Cell barcode list heart_genes.tsv.gz Gene list heart_expression_matrix.mtx.gz Cell-by-gene expression matrix Data processing code process_heart_cells.R Generates benchmarking dataset from input data. (Reads heart_barcodes.tsv.gz, heart_genes.tsv.gz, and heart_expression_matrix.mtx.gz; Runs the standard Seurat pipeline; Saves the resulting Seurat dataset as heart_tissue_cells.RDS and the resulting cell labels as benchmark_dataset_heart_data_type_labels.csv)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.
The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scRNA-seq was performed using 10x Genomics platform. Study was carried on a patient diagnosed as acute megakaryoblastic leukemia (AMKL) combined with plasma cell neoplasm and treated with anti-CD38 therapy. Data was collected before and after treatment.The dataset includes two cellranger processed matrices and seurat object during the analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the pre-processed and cleaned RDS file that contains the main sc-RNAseq data (10x platform) from inflamed oral mucosal (OM) and head-and-neck squamous cell carcinoma (HNSCC) tissues used in this publication (https://www.nature.com/articles/s41586-022-04718-w). Script for processing and generating figures is uploaded on Github: https://github.com/MairFlo/Tumor_vs_Inflamed.
Abstract:
Immunotherapies have achieved remarkable successes in the treatment of cancer, but major challenges remain1,2. An inherent weakness of current treatment approaches is that therapeutically targeted pathways are not restricted to tumours, but are also found in other tissue microenvironments, complicating treatment3,4. Despite great efforts to define inflammatory processes in the tumour microenvironment, the understanding of tumour-unique immune alterations is limited by a knowledge gap regarding the immune cell populations in inflamed human tissues. Here, in an effort to identify such tumour-enriched immune alterations, we used complementary single-cell analysis approaches to interrogate the immune infiltrate in human head and neck squamous cell carcinomas and site-matched non-malignant, inflamed tissues. Our analysis revealed a large overlap in the composition and phenotype of immune cells in tumour and inflamed tissues. Computational analysis identified tumour-enriched immune cell interactions, one of which yields a large population of regulatory T (Treg) cells that is highly enriched in the tumour and uniquely identified among all haematopoietically-derived cells in blood and tissue by co-expression of ICOS and IL-1 receptor type 1 (IL1R1). We provide evidence that these intratumoural IL1R1+ Treg cells had responded to antigen recently and demonstrate that they are clonally expanded with superior suppressive function compared with IL1R1− Treg cells. In addition to identifying extensive immunological congruence between inflamed tissues and tumours as well as tumour-specific changes with direct disease relevance, our work also provides a blueprint for extricating disease-specific changes from general inflammation-associated patterns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary files from the paper: "scANANSE: gene regulatory network and motif analysis of single-cell clusters" as well as several accompanying datasets.
The supplementary files:
Install_Rstudio.pdf
AnanseScanpy_equivalent.pdf
The pre-processed Seurat object and the two pre-processed Scanpy objects can be used to run the scANANSE pipeline with:
preprocessed_PBMC.Rds
rna_PBMC.h5ad
atac_PBMC.h5ad
Additional raw data, used to construct the pre-processed objects, supplemented from:
https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz.tbi, https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seura
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
data.tar.gz contains all files from the data directory associated with the 230313_TS_CCCinHumanAD GitHub project and includes the following:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This site provides access to datasets from the CPA-Perturb-seq manuscript Kowalski*, Wessels*, Linder* et al., including processed Perturb-seq datasets from HEK293FT and K562. We release these data as Seurat objects, where each object contains single-cell quantifications of gene expression (RNA assay), and in addition, quantifications of polyA site usage (polyA site assay). To explore these data, please install the PASTA (PolyA Site analysis using relative Transcript Abundance) package, which provides infrastructure and analytical tools to explore alternative polyadenylation at single-cell resolution. For each dataset, we also include a fragment file which enables visualization of read coverage plots across groups of cells.
The files include:
1. CPA_HEK293FT.Rds: Seurat object containing the HEK293 CPA-Perturb-seq dataset
2. CPA_HEK293FT_fragments.tsv.gz : Fragment file for the HEK293 dataset
3. CPA_HEK293FT_fragments.tsv.gz.tbi : Fragment file index for the HEK293 dataset
4. CPA_K562.Rds : Seurat object containing the K562 CPA-Perturb-seq dataset
5. CPA_K562_fragments.tsv.gz : Fragment file for the K562 dataset
6. CPA_K562_fragments.tsv.gz.tbi : Fragment file index for the K562 dataset
R code below:
library(PASTA)
hek <- readRDS("CPA_HEK293FT.Rds")
# remove fragment file information
Fragments(hek) <- NULL
# Update the path of the fragment file
Fragments(hek) <- CreateFragmentObject(path = "download/CPA_HEK293FT_fragments.tsv.gz", cells = Cells(hek))
# visualize polyA site usage
PolyACoveragePlot(hek, region ="7-26212195-26213351")
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated branching trajectories. FILES: Data processing code adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form branching trajectories . generate_test_data_branching_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_branching_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_branching_traj.csv. Resulting datasets cells_sim_branching_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_branching_traj_sim_milo_paper.R. benchmark_dataset_sim_branching_traj.csv Cell labels generated by generate_test_data_branching_traj_sim_milo_paper.R.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset supporting the EPI-Clone manuscript: scRNA-seq profiling of hematopoietic stem and progenitor cells (HSPCs) was performed with the 3' 10x Genomics profiling. Three experiments are included: Two where HSCs were clonally labeled with the LARRY system, transplanted to recipient mouse and profiled 4-5 months later (post-transplant hematopoiesis), and one where HSPCs were profiled straight from an unperturbed mouse.Dataset is a seurat (v4) object with the following assays, reductions and metadata:ASSAYS:AB: Antibody expression dataRNA: RNA expression profilesintegrated: Integration of DNA methylation data performed across experimental batches with two batch correction methods: CCA (https://satijalab.org/seurat/reference/runcca) and harmony (https://portals.broadinstitute.org/harmony/articles/quickstart.html).DIMENSIONALITY REDUCTIONpca_cca: PCA performed on the integrated data (CCA integration)umap_cca: UMAP computed on the integrated data (CCA integration)umap_harmony: UMAP computed on the integrated data (Harmony integration)METADATAExperiment: The experiment that the cell is from, values are "LARRY main experiment", "LARRY replicate" and "Native hematopoiesis"ProcessingBatch: Experiments were processed in several batches.CellType: Cell type annotationLARRY: Error corrected LARRY barcodepercent.mt: percentage of mitochondrial DNAnCount_RNA: Read count for the RNA modalitynFeature_RNA: Number of RNAs with at least one readnCount_AB: Read count for the surface protein modalitynFeature_AB: Number of ABs with at least one read
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Tumor evolution is one of the major mechanisms responsible for acquiring therapy-resistant and more aggressive cancer clones. Whether the tumor microenvironment through immune-mediated mechanisms might promote the development of more aggressive cancer types is crucial for the identification of additional therapeutical opportunities. Here, we identified a novel subset of tumor-associated neutrophils, defined as tumor-associated neutrophil precursors (PreNeu). These PreNeu are enriched in female highly proliferative hormone-dependent breast cancers and impair DNA repair capacity. Mechanistically, succinate secreted by tumor-associated PreNeu inhibit homologous recombination, promoting error-prone DNA repair through non-homologous end-joining regulated by PARP-1. Consequently, breast cancer cells acquire genomic instability promoting tumor editing and progression. Selective inhibition of these pathways induces increased tumor cell killing in vitro and in vivo. Tumor-associated PreNeu score correlates with copy number alterations in highly proliferative hormone-dependent tumors from breast cancer patients. Treatment with PARP-1 inhibitors counteract the pro-tumoral effect of these neutrophils.
DESIGN:
Enclosed are raw count gene expression matrices from 3 single-cell sequencing experiments.
- Experiment 1: SMART-Seq2 RNA-sequencing of human mature neutrophils (MatNeu) and neutrophil precursors (PreNeu) derived from luminal B breast tumor biopsy (raw counts).
- Experiment 2: SMART-Seq2 RNA-sequencing of human mature neutrophils (MatNeu) and neutrophil precursors (PreNeu) sorted from luminal B breast tumor biopsies or derived from human cord blood mononuclear cells (vst-normalized gene expression values).
- Experiment 3: Single-cell RNA sequencing using BD Rhapsody on a highly proliferative ER+ breast biopsy (.RDS object).
METHODS:
Rhapsody single-cell data generation, processing and analysis
Sample processing: fresh biopsy single-cell suspension was enriched for CD45+ cells using the CD45 MicroBeads, human (Cat. 130-045-801; Miltenyi) following the manufacturer’s protocol. Cells were single cell captured for sequencing using the Rhapsody HT Single-Cell Analysis system and library was generated using the BD Rhapsody WTA Amplification Kit (Cat. 633801, BD Biosciences) with 8 cycles of amplifications, and sequenced using an Illumina NextSeq2000 instrument with a P2 flow cells and chemistry XLEAP obtaining around 8000 reads/cell.
Computational analysis: Tumor sample FASTQ files were aligned, and feature-barcode matrices were generated using the BD Rhapsody™ Sequence Analysis Pipeline on the Seven Bridges Genomics platform, with the GRCh38 genome assembly as the reference. The resulting data was analyzed using Seurat (v.5.1.0) [10.1038/nbt.4096, 10.1016/j.cell.2019.05.031, 10.1016/j.cell.2021.04.048] in R (v.4.4.2) [R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/]. Quality control measures were applied to each dataset to remove low-quality cells and multiplets. Metrics such as cell counts, UMI counts per cell, genes detected per cell, mitochondrial gene count ratio, and ribosomal gene count ratio were inspected. Data normalization and scaling were performed, regressing out potential confounding factors (UMI counts per cell, genes detected per cell, mitochondrial gene count ratio, and ribosomal gene count ratio). A total of 30 principal components (PCs) was selected for Uniform Manifold Approximation and Projection (UMAP), which was used for dimensionality reduction and cell clustering. Clustering at a resolution of 0.6 yielded 20 clusters. Marker gene identification utilized a hurdle model designed for scRNA-seq data, implemented in the MAST statistical framework [10.1186/s13059-015-0844-5], with Bonferroni-adjusted p-values to correct for multiple testing. Markers were considered significant if they were expressed in at least 70% of cells in a cluster, had an adjusted p-value (pval_adj < 0.05), and displayed a log2 fold change (log2FC > 1). Major cellular populations were annotated based on marker genes, and the dataset was subsetted to retain only bona fide neutrophils. The subsetted cells underwent further graph-based clustering at a resolution of 0.6, resulting in 3 clusters. Marker-based annotation was applied for in-depth characterization. Transcriptomic data underwent zero-preserving imputation using the ALRA method [10.1038/s41467-021-27729-z]. The Seurat AddModuleScore function was employed to compute and evaluate the enrichment of neutrophil-related signatures. Pathway enrichment analysis was conducted using Metascape [10.1038/s41467-019-09234-6], considering pathways with a Benjamini-Hochberg-adjusted p-value < 0.05, involving at least three differentially expressed genes (DEGs), and a minimum enrichment score of 1.5. Finally, trajectory analysis was performed using a custom script based on Monocle3 [10.1038/s41586-019-0969-x] (v.1.2.7). Cell spatial coordinates were imported from the Seurat object to create the CellDataSet object required by Monocle. Trajectories were constructed using the learn_graph function, and cells were ordered along pseudotime.
Single-cell SMART sequencing (SMART-Seq2)
Sample processing: fresh biopsy single-cell suspension or cord blood derived-IMCs were sorted as described above into 96-well PCR plates containing cell lysis buffer. Samples collected in cell lysis buffer were used for RNA-seq library preparation with the NEBNext Single Cell / Low InputRNA Library Prep Kit for Illumina (NEB, E6420S), following the manufacturer’s protocol for single cells, with the following parameters adjusted: 17 cycles for cDNA amplification PCR, 11cycles for library enrichment PCR. Libraries were dual-indexed (NEBNext Dual Index Primers Set 1, NEBE7600S), and sequenced on an Illumina NextSeq 500 instrument with 75 cycles reagents. Quality controls and read mapping to the reference genome were performed using the same criteria described for bulk RNA-seq, except that reads having same start/end coordinates and identical nucleotide sequence were marked and deduplicated to avoid excessive bias due to PCR amplification. Differential expression was performed in R statistical environment using DESeq2 pipeline (v1.28.1). Cellswith less than 500K mapped reads were removed from the analysis. Library-size normalized data were transformed using the variance stabilizing transformation and batch effect between tumor- sorted PreNeu or LOX-1+ Neutrophils and cord blood derived-PreNeu or LOX-1+ Neutrophils was corrected using Combat-Seq. Cell-types of origin (PreNeu/PMN-MDSC), which were independently identified in both conditions, were set as covariates to preserve biological signal.
Computational analysis: Marker genes specific for PreNeu were identified from the single-cell data. We selected all genesbeing differentially expressed in PreNeu cells vs LOX-1+ neutrophils (FDR < 0.05). To identify genes being robustly expressed in this setting and reduce the possibility of selecting significant genes expressed at low levels, we restricted theanalysis to features according to their mean expression levels (basemean > 50) and then focused on genes showing selectiveupregulation in PreNeu (log2FoldChange>1). Filtered elements were used to generate a protein-protein interaction network through String Database. We determined marker genes by identifying a main subnetwork showing higher degreeof connectivity between nodes. We then selected 11 genes within this cluster based on their biological function related tothe regulation of immune system processes.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
we collected 40 tumor and adjacent normal tissue samples from 19 pathologically diagnosed NSCLC patients (10 LUAD and 9 LUSC) during surgical resections, and rapidly digested the tissues to obtain single-cell suspensions and constructed the cDNA libraries of these samples within 24 hours using the protocol of 10X gennomic. These libraries were sequenced on the Illumina NovaSeq 6000 platform. Finally we obtained the raw gene expression matrices were generated using CellRanger (version 3.0.1). Information was processed in R (version 3.6.0) using the Seurat R package (version 2.3.4).
This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx
, barcodes.tsv
, and genes.tsv
files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz
files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R
and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R
R script and execute commands as necessary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Richter transformation (RT) is a paradigmatic evolution of chronic lymphocytic leukemia (CLL) into a very aggressive large B cell lymphoma conferring a dismal prognosis. The mechanisms driving RT remain largely unknown. We characterized the whole genome, epigenome and transcriptome, combined with single-cell DNA/RNA-sequencing analyses and functional experiments, of 19 cases of CLL developing RT. Studying 54 longitudinal samples covering up to 19 years of disease course, we uncovered minute subclones carrying genomic, immunogenetic and transcriptomic features of RT cells already at CLL diagnosis, which were dormant for up to 19 years before transformation. We also identified new driver alterations, discovered a new mutational signature (SBS-RT), recognized an oxidative phosphorylation (OXPHOS)high–B cell receptor (BCR)low-signaling transcriptional axis in RT and showed that OXPHOS inhibition reduces the proliferation of RT cells. These findings demonstrate the early seed- ing of subclones driving advanced stages of cancer evolution and uncover potential therapeutic targets for RT.
This repository contains the processed scRNA-seq data (expression matrices, Seurat objects, metadata) related with this publication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the processed scRNA-seq data and code for single-cell unified polarization assessment. Please refer to the article "Scupa: Single-cell unified polarization assessment of immune cells using the single-cell foundation model" for the detailed data and method description.
*.rds: the processed scRNA-seq datasets with Universal Cell Embeddings saved as a Seurat object. The Universal Cell Embeddings are saved in the assay 'uce'.
notebooks.zip: Jupyter notebooks containing code for training models and applications to multiple datasets.
The dataset contains an integrated, annotated Seurat v4 object. One can load the dataset into the R environment using the code below:
seurat_obj <- readRDS('PATH/TO/DOWNLOAD/seurat.rds')
The object has three assays: (I) RNA, (II) SCT and (III) integrated.