Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Seurat objects containing the raw and normalized data for:Normal bone marrow (NBM) atlas: contains all cells obtained through segmentation after filtering and QC. Includes coarse and fine level of annotations that were obtained through an iterative process of subclustering. Neighborhood analysis results are included as a metadata column. Additional Osteo-MSC and Fibro-MSC cells that were manually annotatedAML/NSM CODEX data: contains all cells after filtering for 3 diagnostic and 2 post-therapy AML samples as well as 3 negative staging marrow samples. Cell labels were derived through reciprocal principal component analysis (RPCA) reference mapping onto the normal bone marrow atlas. Neighborhood analysis was conducted separately for AML Diagnostic, AML Post-Therapy, and NSM samples. Neighborhoods were manually annotated for each set. The results of the neighborhood analysis were merged and included in the metadata of the Seurat object. All normalized data is stored in the Seurat assay object. Markers that were not included in normalization and downstream analysis are included with raw values as a metadata column. Full source code used to generate these objects can be found on GitHub: https://github.com/shovikb94/spatial-bonemarrow-atlas/tree/mainSee related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7174914
Dataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"
Structure
ST_berghei_liver
contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.
STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver
visium_berghei_liver
contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.
V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection
V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection
V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection
se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver
snSeq_berghei_liver
contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.
cellranger_cnt_out contains feature count matrix information from cell ranger output
final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz
raw images.zip contains raw images for supplementary figures 20-22
adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the context of the Human Cell Atlas, we have created a single-cell-driven taxonomy of cell types and states in human tonsils. This repository contains the Seurat objects derived from this effort. In particular, we have datasets for each modality (scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics), as well as cell type-specific datasets. Most importantly, this is the input that we used to create the HCATonsilData package, which allows programmatic access to all this datasets within R.
Version 2 of this repository includes cells from 7 additional donors, which we used as a validation cohort to validate the cell types and states defined in the atlas. In addition, in this version we also provide the Seurat object associated with the spatial transcriptomics data (10X Visium), as well as the fragments files for scATAC-seq and Multiome
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scRNA data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 (Sur et al. 2023), see a detailed description of the study here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055256/
Data were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223922 to create a R Seurat object and converted into AnnData (h5ad) file to be able to analyse with e.g. python scanpy package.
If you use this data, please cite Sur et al. 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The distal region of the uterine (Fallopian) tube is commonly associated with high-grade serous carcinoma (HGSC), the predominant and most aggressive form of ovarian or extra-uterine cancer. Specific cell states and lineage dynamics of the adult tubal epithelium (TE) remain insufficiently understood, hindering efforts to determine the cell of origin for HGSC. Here, we report a comprehensive census of cell types and states of the mouse uterine tube. We show that distal TE cells expressing the stem/progenitor cell marker Slc1a3 can differentiate into both secretory (Ovgp1+) and ciliated (Fam183b+) cells. Inactivation of Trp53 and Rb1, whose pathways are commonly altered in HGSC, leads to elimination of targeted Slc1a3+ cells by apoptosis, thereby preventing their malignant transformation. In contrast, pre-ciliated cells (Krt5+, Prom1+, Trp73+) remain cancer-prone and give rise to serous tubal intraepithelial carcinomas and overt HGSC. These findings identify transitional pre-ciliated cells as a previously unrecognized cancer-prone cell state and point to pre-ciliation mechanisms as novel diagnostic and therapeutic targets. Methods
Single-cell RNA-sequencing library preparation For TE single cell expression and transcriptome analysis we isolated TE from C57BL6 adult estrous female mice. In 3 independent experiments a total of 62 uterine tubes were collected. Each uterine tube was placed in sterile PBS containing 100 IU ml-1 of penicillin and 100 µg ml-1 streptomycin (Corning, 30-002-Cl), and separated in distal and proximal regions. Tissues from the same region were combined in a 40 µl drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with 25G needles. Minced tissues were transferred with help of a sterile wide bore 200 µl pipette tip into a 1.8 ml cryo vial containing 1.2 ml A-mTE-D1 (300 IU ml-1 collagenase IV mixed with 100 IU ml-1 hyaluronidase; Stem Cell Technologies, 07912, in DMEM Ham’s F12, Hyclone, SH30023.FS). Tissues were incubated with loose cap for 1 h at 37°C in a 5% CO2 incubator. During the incubation tubes were taken out 4 times and tissues suspended with a wide bore 200 µl pipette tip. At the end of incubation, the tissue-cell suspension from each tube was transferred into 1 ml TrypLE (Invitrogen, 12604013) pre-warmed to 37°C, suspended 70 times with a 1000 µl pipette tip, 5 ml A-SM [DMEM Ham’s F12 containing 2% fetal bovine serum (FBS)] were added to the mix, and TE cells were pelleted by centrifugation 300x g for 10 minutes at 25°C. Pellets were then suspended with 1 ml pre-warmed to 37°C A-mTE-D2 (7 mg ml-1 Dispase II, Worthington NPRO2, and 10 µg ml-1 Deoxyribonuclease I, Stem Cell Technologies, 07900), and mixed 70 times with a 1000 µl pipette tip. 5 ml A-mTE-D2 was added and samples were passed through a 40 µm cell strainer, and pelleted by centrifugation at 300x g for 7 minutes at +4°C. Pellets were suspended in 100 µl microbeads per 107 total cells or fewer, and dead cells were removed with the Dead Cell Removal Kit (Miltenyi Biotec, 130-090-101) according to the manufacturer’s protocol. Pelleted live cell fractions were collected in 1.5 ml low binding centrifuge tubes, kept on ice, and suspended in ice cold 50 µl A-Ri-Buffer (5% FBS, 1% GlutaMAX-I, Invitrogen, 35050-079, 9 µM Y-27632, Millipore, 688000, and 100 IU ml-1 penicillin 100 μg ml-1 streptomycin in DMEM Ham’s F12). Cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000-6,000 were loaded on Chromium controller (10X Genomics, Single Cell 3’ v2 chemistry) to perform single cell partitioning and barcoding using the microfluidic platform device. After preparation of barcoded, next-generation sequencing cDNA libraries samples were sequenced on Illumina NextSeq500 System.
Download and alignment of single-cell RNA sequencing data For sequence alignment, a custom reference for mm39 was built using the cellranger (v6.1.2, 10x Genomics) mkref function. The mm39.fa soft-masked assembly sequence and the mm39.ncbiRefSeq.gtf (release 109) genome annotation last updated 2020-10-27 were used to form the custom reference. The raw sequencing reads were aligned to the custom reference and quantified using the cellranger count function.
Preprocessing and batch correction All preprocessing and data analysis was conducted in R (v.4.1.1 (2021-08-10)). The cellranger count outs were first modified with the autoEstCont and adjustCounts functions from SoupX (v.1.6.1) to output a corrected matrix with the ambient RNA signal (soup) removed (https://github.com/constantAmateur/SoupX). To preprocess the corrected matrices, the Seurat (v.4.1.1) NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and RunUMAP functions were used to create a Seurat object for each sample (https://github.com/satijalab/seurat). The number of principal components used to construct a shared nearest-neighbor graph were chosen to account for 95% of the total variance. To detect possible doublets, we used the package DoubletFinder (v.2.0.3) with inputs specific to each Seurat object. DoubletFinder creates artificial doublets and calculates the proportion of artificial k nearest neighbors (pANN) for each cell from a merged dataset of the artificial and actual data. To maximize DoubletFinder’s predictive power, mean-variance normalized bimodality coefficient (BCMVN) was used to determine the optimal pK value for each dataset. To establish a threshold for pANN values to distinguish between singlets and doublets, the estimated multiplet rates for each sample were calculated by interpolating between the target cell recovery values according to the 10x Chromium user manual. Homotypic doublets were identified using unannotated Seurat clusters in each dataset with the modelHomotypic function. After doublets were identified, all distal and proximal samples were merged separately. Cells with greater than 30% mitochondrial genes, cells with fewer than 750 nCount RNA, and cells with fewer than 200 nFeature RNA were removed from the merged datasets. To correct for any batch defects between sample runs, we used the harmony (v.0.1.0) integration method (github.com/immunogenomics/harmony).
Clustering parameters and annotations After merging the datasets and batch-correction, the dimensions reflecting 95% of the total variance were input into Seurat’s FindNeighbors function with a k.param of 70. Louvain clustering was then conducted using Seurat’s FindClusters with a resolution of 0.7. The resulting 19 clusters were annotated based on the expression of canonical genes and the results of differential gene expression (Wilcoxon Rank Sum test) analysis. One cluster expressing lymphatic and epithelial markers was omitted from later analysis as it only contained 2 cells suspected to be doublets. To better understand the epithelial populations, we reclustered 6 epithelial populations and reapplied harmony batch correction. The clustering parameters from FindNeighbors was a k.param of 50, and a resolution of 0.7 was used for FindClusters. The resulting 9 clusters within the epithelial subset were further annotated using differential expression analysis and canonical markers.
Pseudotime analysis Potential of heat diffusion for affinity-based transition embedding (PHATE) is dimensional reduction method to more accurately visualize continual progressions found in biological data 35. A modified version of Seurat (v4.1.1) was developed to include the ‘RunPHATE’ function for converting a Seurat Object to a PHATE embedding. This was built on the phateR package (v.1.0.7) (https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again). In addition to PHATE, pseudotime values were calculated with Monocle3 (v.1.2.7), which computes trajectories with an origin set by the user 36,55–57. The origin was set to be a progenitor cell state confirmed with lineage tracing experiments. 35. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019). doi:10.1038/s41587-019-0336-3 36. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). doi:10.1038/s41586-019-0969-x 55. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 32, 381–386 (2014). doi:10.1038/nbt.2859 56. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nature Methods 14, 309–315 (2017). doi:10.1038/nmeth.4150 57. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982 (2017). doi:10.1038/nmeth.4402
CD8 T cell exhaustion is a major barrier limiting anti-tumor therapy. Though checkpoint blockade temporarily improves exhausted CD8 T cell (Tex) function, the underlying epigenetic landscape of Tex remains largely unchanged, preventing their durable “reinvigoration.†Whereas the transcription factor (TF) TOX has been identified as a critical initiator of Tex epigenetic programming, it remains unclear whether TOX plays an ongoing role in preserving Tex biology after cells commit to exhaustion. Here, we decoupled the role of TOX in the initiation versus maintenance of CD8 T cell exhaustion by temporally deleting TOX in established Tex. Induced TOX ablation in committed Tex resulted in apoptotic-driven loss of Tex, reduced expression of inhibitory receptors including PD-1, and a pronounced decrease in terminally differentiated subsets of Tex cells. Simultaneous gene expression and epigenetic profiling revealed a critical role for TOX in ensuring ongoing chromatin accessibility and transcri..., Cells from inducible-Cre (Rosa26CreERT2/+Toxfl/fl P14) mice where TOX was temporally deleted from mature populations of LCMV-specific T exhausted cells after establishment of chronic LCMV infection 5 days post infection were subjected to scRNA and scATACseq coassay,naive cells and WT cells were used as controls. Analysis pipeline developed by Josephine Giles and vignettes published by Satija and Stuart labs.Transcript count and peak accessibility matrices deposited in GSE255042,GSE255043. Seurat/Signac was used to process the scRNA and scATACseq coassay data The processed Seurat/Signac object above was subsequently used for downstream RNA and ATAC analyses as described below: DEGs between TOX WT and iKO cells within each subset were identified using FindMarkers (Seurat, Signac), with a log2-fold-change threshold of 0, using the SCT assay. DACRs were identified using FindMarkers using the "LR" test, with a log2-fold-change threshold of 0.1, a min.pct of 0.05, and included the number of c..., , # Continuous expression of TOX safeguards exhausted CD8 T cell epigenetic fate
https://doi.org/10.5061/dryad.8kprr4xx9
Seurat/Signac pipeline for multiomic scRNA-seq and scATAC-seq dataset, generated following inducible TOX deletion in LCMV-Cl13
Author
Yinghui Jane Huang
Purpose: Generate and process Seurat/Signac object for downstream analyses Written: Nov 2021 through Oct 2022 Adapted from: Analysis pipeline developed by Josephine Giles and vignettes published by Satija and Stuart labs Input dataset: Transcript count and peak accessibility matrices deposited in GSE255042,GSE255043
1) Create individual signac objects for each sample from the raw 10x cellranger output.
2) Merge individual objects to create one seurat object.
3) Add metadata to merged seurat object.
Following are the steps in the attached html file for analysis of the paired data (ATAC+RNA)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractscDynaBar is an innovative approach that combines CRISPR-Cas9 dynamic barcoding with single-cell sequencing to record temporal cellular events. Over a 4-week period, genetic barcodes accumulate mutations, which are then sequenced together with the transcriptome of each single cell. This enables the creation of a time-ordered record of cellular events, providing a unique perspective on biological dynamics.In this study, we applied scDynaBar to track the transition from a pluripotent state to a two-cell (2C)-like state in mouse embryonic stem cells (mESCs). Our results demonstrate the transient nature of the 2C-like state. Additionally, we show consistent mutation rates across different cell types in a mouse gastruloid model, underscoring the robustness and versatility of the system across diverse biological contexts.OverviewThis repository contains data and scripts for analyzing single-cell RNA-seq and Bulk RNA-seq experiments. The aim is to process barcode sequences and associated metadata, create Seurat objects, and generate visualizations for results presented in the associated paper.For more information, code and data updates visit our github repository --> https://github.com/socyol/scDynaBar.gitData/barcode_sequences/ --> This directory contains CSV files with barcode sequence datametadatas/ -->This directory contains metadata CSV files that provide details on: The number of reads/alleles (coverage) for each cell; Allele features such as mean diversity, mean length, and percentage of original sequences (% original sequences). For bulk experiments, this data is provided for each sample, including the system used (cas9 or BE3) and the spacer utilized (from a selection of 7).seurat_objects/ --> Contains the Seurat objects created from processed single-cell data, specifically those that have passed quality control.scripts/#### Single-cells experiment analysis:- sc_1_SeuratObject_analysis.R
: This script processes the filtered_feature_bc_matrix to convert it into a Seurat object (matrix in GEOomnibus accession)- sc_2_Barcode_sequences_analysis.R
: This script processes the barcode sequence data and merges it with the Seurat objects to create the metadata (located in the metadata
folder). This metadata is crucial for conducting analyses, viewing results, and generating figures.#### Bulk analysis- bulk_1_analysis.R
: First step of the Bulk analysis. This script prepares the FASTQ data by organizing it into different folders to optimize and facilitate the subsequent analysis. This step ensures proper structuring and management of the data for the following phases.- bulk_2_analysis.sh
: Second step of the Bulk analysis. This Bash script automates the analysis process by creating jobs for each data file, which then execute the code provided in bulk_3_analysis.R. This parallelization helps to speed up the processing. At the end of this step, an output file similar to the barcode sequences for bulk data is generated, providing a comprehensive summary of the data processing.- bulk_3_analysis.R
:Third and final step of the Bulk analysis. This R script conducts a thorough analysis of the previously processed and organized data. It includes computations and data filtering.#### PLOTS- plots_1-bulk.R
: Contains scripts for creating visualizations related to bulk data.- plots_2-timecourse.R
: Scripts for visualizations specific to time course analyses.- plots_3-zscan4.R
: Scripts for visualizations related to zscan4 experiment.- plots_4-gastruloids.R
: Scripts for visualizations related to gastruloid data.- settings.R
: This script includes all necessary libraries and custom functions created for this project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present study is based on the 10X scRNA-seq dataset published by the Allen Institute for Brain Science and publicly available at: https://portal.brain-map.org/atlases-and-data/RNA-seq/mouse-whole-cortex-and-hippocampus-10x. The cells from the hippocampus region were selected from the gene count expression matrix and pre-processed in R v3.6.1 according to the Seurat v3.1.5 standard pre-processing workflow for quality control, normalization, and analysis of scRNA-seq data. Here we make the final seurat object and other datasets further used in the code (https://github.com/eviho/10XHip2021_VihoEMG) available for download.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R objects for the V1 Datasets. Created with R package Seurat.E14_allcells_seur_ob.Rdata - E14.5 V1 Dataset. Includes all cells. Grouped by "ordered_manuscript" in @data.info slot. Corresponds to Fig. 1c. E14_mesenchyme_seur_ob.Rdata - E14.5 V1 Dataset. Only mesenchymal cells. Grouped by "ordered_manuscript" in @data.info slot. Corresponds to Fig. 2a. merged_mesenchyme_seur_ob.Rdata - E12.5, E14.5, E17.5 merged V1 Dataset. Only mesenchymal cells. Grouped by "ordered_manuscript" in @data.info slot. Corresponds to Fig. 3a. E14_epithelial_seur_ob.Rdata - E14.5 V1 Dataset. Only epithelial cells. Grouped by "ordered" in @data.info slot. Corresponds to Fig. 4a. E14_endocrine_seur_ob.Rdata - E14.5 V1 Dataset. Only endocrine cells. Grouped by "ordered_res1_5" in @data.info slot. Corresponds to Fig. 4f. merged_epithelial_seur_ob.Rdata - E12.5, E14.5, E17.5 merged V1 Dataset. Only epithelial cells. Grouped by "ordered" in @data.info slot. Corresponds to Supplementary Fig.5a
BCMA-targeted CAR-T therapy has shown potent treatment outcomes in treating multiple myeloma (MM), a disease characterized by malignant bone marrow (BM) plasma cells. However, the remodeling of MM microenvironment after CAR-T therapy remains poorly understood. Here, we report the reconstitution of MM microenvironment by obtaining single-cell transcriptomes for paired BM specimens (n = 14) from 7 MM patients before (i.e., baseline, ''day −4'') and after (i.e., ''day 28'') post-lymphodepleted BCMA CAR-T therapy. Our analysis revealed heterogeneity in driver gene expression among MM cells, even those harboring the same cytogenetic abnormalities. The best overall responses of patients over the 15-month follow-up are positively correlated with the abundance and targeted cytotoxic activity of CD8+ effector CAR-T cells on day 28 after CAR-T cell infusion. Additionally, favorable responses are associated with attenuated immunosuppression mediated by regulatory T cells (Tregs), enhanced CD8+ eff..., The collected paired BM specimens from MM patients before and after BCMA CAR-T therapy were isolated into single cell suspensions, and 3'-scRNA-seq (Chromium Single Cell 3′ v3 Libraries) analysis was performed on each sample. The analysis included 7 MM patients (P1-P7). We collected baseline BM aspirate specimens (P1_B, P2_B, P3_B, P4_B, P5_B, P6_B, and P7_B) from each patient before (i.e., baseline, ''day −4'') BCMA CAR-T cell infusion (i.e., ''day 0''), and the BM aspirate specimens (P1_R, P2_R, P3_R, P4_R, P5_R, P6_R, and P7_R) after BCMA CAR-T therapy were collected on day 28. These patients had received 2 or 3 previous lines of therapies, including two patients (P2 and P6) with extramedullary disease; the other patients did not experience extramedullary progression. All patients received cyclophosphamide-mediated lymphodepletion on day −3 with the aim of potentiating the expansion of CAR-T cells. Efficacy assessments based on the International Myeloma Working Group (IMWG) criteria ..., , # Reconstitution of the Multiple Myeloma Microenvironment Following Lymphodepletion with BCMA CAR-T Therapy
https://doi.org/10.5061/dryad.44j0zpcn7
The collected paired BM specimens from MM patients before and after BCMA CAR-T therapy were isolated into single cell suspensions, and 3'-scRNA-seq (Chromium Single Cell 3' v3 Libraries) analysis was performed on each sample.
The dataset includes '.mtx' files, each of which is for creating a Seurat object in R.
The naming convention of the '.mtx' files was following the sampling time points of the MM patients receiving BCMA CAR-T therapy. Specifically, the 14 BM specimens that performed scRNA-seq were collected from 7 relapsed or refractory MM patients (i.e. 'P1, P2, P3, P4, P5, P6, and P7') before and after BCMA CAR-T therapy. For these specimens, the baseline (i.e. 'B') specimens (P1_B, P2_B, P3_B, P4_B, P5_B, P6_B, and P7_B) were collected from ...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Single cell RNA sequencing (drop-seq) data of forebrain organoids carrying pathogenic MAPT R406W and V337M mutations. Organoids were generated from 5 heterozygous donor lines (two R406W lines and three V337M lines) and respective CRISPR-corrected isogenic controls. Organoids were also generated from one homozygous R406W donor line. Single-cell sequencing was performed at 1, 2, 3, 4, 6 and 8 months of organoid maturation. Methods Single-cell transcriptomes were obtained using drop-seq (Macosko et al., 2015, https://doi.org/10.1016/j.cell.2015.05.002). Counts matrices were generated using the Drop-seq tools package (Macosko et al. 2015), with full details available online (https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf). Briefly, raw reads were converted to BAM files, cell barcodes and UMIs were extracted, and low-quality reads were removed. Adapter sequences and polyA tails were trimmed, and reads were converted to Fastq for STAR alignment (STAR version 2.6). Mapping to human genome (hg19 build) was performed with default settings. Reads mapped to exons were kept and tagged with gene names, beads synthesis errors were corrected, and a digital gene expression matrix was extracted from the aligned library. We extracted data from twice as many cell barcodes as the number of cells targeted (NUM_CORE_BARCODES = 2x # targeted cells). Downstream analysis was performed using Seurat 3.0 in R version 3.6.3. An individual Seurat object was generated for each sample, and filtered and clustered individually. Cells with < 300 genes detected were filtered out, as were cells with > 10% mitochondrial gene content. Counts data were log-normalized using the default NormalizeData function and the default scale of 1e4. Then, the top 2000 variable genes were identified using the Seurat FindVariableFeatures function (selection.method = “vst”, nfeatures = 2000), followed by scaling and centering using the default ScaleData function. Principal Components Analysis was carried out on the scaled expression values of the 2000 top variable genes, and the cells were clustered using the first 50 principal components (PCs) as input in the FindNeighbors function, and a resolution of 0.4 in the FindClusters function. Non-linear dimensionality reduction was performed by running UMAP on the first 50 PCs. Following clustering and dimensionality reduction, putative cell doublets were identified using DoubletFinder (McGinnis et al. 2019; https://doi.org/10.1016/j.cels.2019.03.003), assuming a doublet formation rate of 5%. For each sample, the optimal pK value was identified based on the results of paramSweep_vs, summarizeSweep and find.pK functions of the DoubletFinder package. Instead of using the default paramSweep_vs function, we extended the upper range of computed pK values to 1.2. We visually verified cells identified as doublets had high nFeatures (number of genes expressed) by plotting the pANN metric against nFeatures. For samples not showing this correlation, we adjusted the pK value to the next highest peak in the pK/BCmetric plot. Finally, the individual Seurat objects were merged.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are processed Seurat objects for the biological datasets in Localized Marker Detector (https://github.com/KlugerLab/LocalizedMarkerDetector):Tabular Muris bone marrow dataset (FACS-based and Droplet-based)We used publicly available scRNA-seq mouse bone marrow datasets (FACS and Droplet-based) from the Tabular Muris Consortium, which were already pre-processed and annotated according to their workflow. In addition, we applied ALRA imputation to generate a denoised assay alra and added several cell annotations: (1) Cell cycle annotation using CellCycleScoring with the updated 2019 cell cycle gene set; (2) Module Activity Scores for the gene modules listed in our paper.Mouse embryo skin datasetWe separated dermal cell populations from newly collected mouse embryo skin samples (aligned to the mouse genome mm10 using CellRanger (v.6.1.2)). Cells from the wildtype and SmoM2YFP mutant (SmoM2) for two consecutive days (embryonic day 13.5 and 14.5) were pooled for analysis. To avoid batch effects from pooling or integrating, we analyzed each condition separately: E13.5 SmoM2, E13.5 WT, E14.5 SmoM2, and E14.5 WT. For each condition, we performed standard normalization, selected the top 2,000 highly variable genes, and scaled the data using the Seurat v4 R package. We then applied PCA, retaining the number of PCs determined by the elbow plot: E13.5 SmoM2 (14 PCs), E13.5 WT (12 PCs), E14.5 SmoM2 (12 PCs), and E14.5 WT (11 PCs).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
data.tar.gz contains all files from the data directory (except for sam outputs from STAR) associated with the 230926_EJ_Setbp1_AlternativeSplicing GitHub project and includes the following files:
./marvel: - This directory contains rds and Rdata objects that were created using the MARVEL R package
cell_type_goresults.rds - This is the go results split by cell type
marvel_04_split_counts.Rdata - This R data includes all environment objects from MARVEL script 04, and is used for downstream plotting
normalized_sj_expression.Rds - This object is the normalized splice junction expression
Setbp1_marvel_aligned.rds - Final prepared MARVEL object before any SJU analyses have been run
significant_tables.RData - For those who do not want to load multiple massive files, this includes all significant SJU results for each cell type
sj_usage_cell_type.rds - This data object has splice junction usage calculated for each cell type
sj_usage_condition.rds - This data object has splice junction usage calculated for each cell type and also split by condition
./seurat: - This directory contains all intermediate and final Seurat single-cell gene expression objects
annotated_brain_samples.rds - This is the final iteration of the processing in Seurat for a final annotated object. Please use this object for any Seurat or single-cell gene expression analyses.
clustered_brain_samples.rds - This is the clustered Seurat object, before cell type annotation based on canonical markers.
filtered_brain_samples_pca.rds - This is the filtered Seurat object, before clustering but after PCA.
filtered_brain_samples.rds - This is the filtered Seurat object, before PCA.
integrated_brain_samples.rds - This the integrated Seurat object, before other steps.
./star: - All files in the STAR directory are outputs from STARsolo, as described in our methods. Each output directory contains the same files, so only one example is included here for brevity. Intermediate SAM files were removed to optimize space.
J1/ - This directory contains outputs for brain sample J1
J13/ - This directory contains outputs for brain sample J13
J15/ - This directory contains outputs for brain sample J15
J2/ - This directory contains outputs for brain sample J2
J3/ - This directory contains outputs for brain sample J3
J4/ - This directory contains outputs for brain sample J4
K1/ - This directory contains outputs for kidney sample K1
K2/ - This directory contains outputs for kidney sample K2
K3/ - This directory contains outputs for kidney sample K3
K4/ - This directory contains outputs for kidney sample K4
K5/ - This directory contains outputs for kidney sample K5
K6/ - This directory contains outputs for kidney sample K6
./star/genome: - This directory contains outputs from running STAR genomeGenerate. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
Log.out
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
./star/J1: - This is the head STAR directory for sample J1. It contains logs, basic QC, and gene and splice junction counts. For more information about the STAR pipeline and its outputs, please refer to the STAR documentation https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
Log.final.out
Log.out
Log.progress.out
SJ.out.tab
Solo.out/
STARgenome/
./star/J1/Solo.out:- This directory contains the outputs used for downstream analysis
Barcodes.stats
GeneFull_Ex50pAS/
SJ/
./star/J1/Solo.out/GeneFull_Ex50pAS: - This directory contains the filtered and raw barcodes, features, and matrix files for gene expression (including introns)
Features.stats
filtered/
raw/
Summary.csv
UMIperCellSorted.txt
./star/J1/Solo.out/GeneFull_Ex50pAS/filtered: - This directory contains the filtered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages)
barcodes.tsv.gz - This file contains filtered cell barcodes
features.tsv.gz - This file contains filtered features (genes)
matrix.mtx.gz - This file contains the filtered cell by gene expression count matrix
./star/J1/Solo.out/GeneFull_Ex50pAS/raw: - This directory contains the unfiltered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages). Files are the same as previously described for filtered.
barcodes.tsv
features.tsv
matrix.mtx
./star/J1/Solo.out/SJ: - This directory contains the QC and raw barcodes, features, and matrix files for splice junction expression
Features.stats
raw/
Summary.csv
./star/J1/Solo.out/SJ/raw: - This directory contains the raw barcodes, features, and matrix files for splice junction expression
barcodes.tsv - This file contains filtered cell barcodes
features.tsv - This file contains filtered features (splice junctions)
matrix.mtx - This file contains the filtered cell by gene expression count matrix
./star/J1/_STARgenome: - This directory contains the STARgenome created and used by STAR for this sample. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This repository contains metadata and single-cell data used to generate figures in the manuscript entitled: "Post-infusion Treg-like CAR T cells identify patients resistant to CD19-CAR therapy". Included here: CSV files containing patient cohort metadata, summary statistics and quantitative PCR results; FCS files for flow and mass cytometry data; processed Seurat object for single-cell sequencing data. Raw single-cell sequencing data, cellranger alignment results, and metadata are available through the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo; GEO accession number: GSE168940). With questions, please reach out to Zinaida Good (zinaida@stanford.edu) or Crystal L. Mackall (cmackall@stanford.edu).
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
This archive contains all the code and data to reproduce the results of the associated manuscript: Piovani et al, "Single-cell atlases of two lophotrochozoan larvae highlight their complex evolutionary histories". We provide filtered scRNA-seq matrices, protein fasta files for each specie used to run SAMap and GenERA as well as the R-code used to generate the datasets and the jupyter notebook to generate SAMap results. In addition we provide the final Seurat objects and all analysis results which can be consulted without re-running the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While human inflammatory skin diseases' cellular and molecular features are well-characterized, their tissue context and systemic impact remain poorly understood. We thus profiled human psoriasis (PsO) as a prototypic immune-mediated condition with a high preference for extra-cutaneous involvement. Spatial transcriptomics (ST) analyses of 25 healthy, active, and clinically uninvolved skin biopsies, and integration with public single-cell transcriptomics data revealed striking differences in immune microniches between healthy and inflamed skin. Tissue scale-cartography further identified core disease features across all active lesions, including the emergence of an inflamed suprabasal epidermal state and the presence of B lymphocytes in lesional skin. Notably, both lesional and distal non-lesional samples were stratified by skin disease severity, and not by the presence of systemic disease. This segregation was driven by macrophage-, fibroblast- and lymphatic-enriched spatial regions with gene signatures associated with metabolic dysfunction. Taken together, these findings suggest that mild and severe forms of PsO have distinct molecular features and that severe PsO may profoundly alter the cellular and metabolic make up of distal unaffected skin sites. Additionally, our study provides an unprecedented resource for the research community to study spatial gene organization of healthy and inflamed human skin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.