MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat"
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to build figure 2: Assignment of cell line type to clusters generated with Seurat, implemented in rCASC. A) RNA-5c clustering, five clusters generated with Seurat (resolution=0.1), using 2500 genes seected as the most variant within the 5000 most expressed (rCASC topx function). . B) RNA-3c clustering, four clusters generated with Seurat (resolution=0.1), using the 2500 genes selected for RNA-5c. C) RNA-5c hierarchical clustering (Euclidean distance, average linkage) of log2 CPM clusters’ pseudo-bulk expression (rCASC bulkClusters function), row-mean centered, and CCLE lung cell lines A449, NCIH838, NCIH2228, NCIH1975 and HCC827 log2 TPM row-mean centered. D) RNA-3c hierarchical clustering (Euclidean distance, average linkage) of log2 CPM clusters’ pseudo-bulk expression (rCASC bulkClusters function), row-mean centered, and CCLE lung cell lines A449, NCIH838, NCIH2228, NCIH1975 and HCC827 log2 TPM row-mean centered.Figure 2Asomewhere_in_your_computer/fig2/RNA2500-5c/VandE/Results_0.1/VandE/5/VandE_Stability_Plot.pdfFigure 2Bsomewhere_in_your_computer/fig2/RNA2500-3c/VandE/Results/VandE/4/VandE_Stability_Plot.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains R Seurat objects associated with our study titled "A single-cell atlas characterizes dysregulation of the bone marrow immune microenvironment associated with outcomes in multiple myeloma".
Single cell data contained within this object comes from MMRF Immune Atlas Consortium work.
The .rds files contains a Seurat object saved with version 4.3. This can be loaded in R with the readRDS command.
Two .RDS files are included in this version of the release.
--
The discovery object contains two assays:
Currently, the validation object only includes the uncorrected RNA assay.
--
The object contains two umaps in the reduction slot:
--
Each sample has three different identifiers:
Each cell has the following annotation information:
--
Each sample has the following information indicating shipment batches, for batch correction
--
Each public_id has limited demographic information based on publicly available information in the MMRF CoMMpass study.
d_specimen_visit_id contains two data points providing limited information about the visit
All the single-cell raw data, along with outcome and cytogenetic information, is available at MMRF’s VLAB shared resource. Requests to access these data will be reviewed by data access committee at MMRF and any data shared will be released under a data transfer agreement that will protect the identities of patients involved in the study. Other information from the CoMMpass trial can also generally be
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
Dataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"
Structure
ST_berghei_liver
contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.
STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver
visium_berghei_liver
contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.
V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection
V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection
V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection
se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver
snSeq_berghei_liver
contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.
cellranger_cnt_out contains feature count matrix information from cell ranger output
final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz
raw images.zip contains raw images for supplementary figures 20-22
adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file is the scRNA-seq data seurat cluster markers
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided datasets correspond to the analyses of individual donor single-cell RNA Sequencing (scRNA-Seq) datasets, before their integration. The datasets have been saved as Seurat v4.0.5 objects. For clustering, we used default settings in Seurat 4.0.5 (resolution 0.8) and increased resolution, if necessary, to separate epithelium in proximal and distal.
The *_clusters.pdf files show the suggested clusters in the individual datasets and the *_indiv_anno1.pdf files show the cell annotations according to the 84 cell states, described in the study with title "Developmental origins of cell heterogeneity in the human lung" (1st preprint version doi: https://doi.org/10.1101/2022.01.11.475631).
The "*_cluster_annotations.csv" files provide information about the suggested annotations of the clusters.
The "*_object_raw_and_log_counts.RData" objects contain the metadata and the UMI-counts [raw and log2(counts+1)] for each donor scRNA-Seq dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The exemplary dataset 390c_wctype refers to the sample 390c and, in the counts table, cell names include the cell type assignment, i.e TGACTAGGTTCCACAA.Treg, (. This dataset is part of 41,650 cells isolated from the caeacum, transverse colon and sigmoid colon of 5 individuals, which is one of the datasets constituting the gut atlas transcriptome.
@font-face {font-family:Helvetica; panose-1:0 0 0 0 0 0 0 0 0 0; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1342208091 0 0 415 0;}@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-536869121 1107305727 33554432 0 415 0;}p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; mso-pagination:widow-orphan; mso-hyphenate:none; font-size:12.0pt; font-family:"Times New Roman",serif; mso-fareast-font-family:"Times New Roman";}a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; mso-themecolor:hyperlink; text-decoration:underline; text-underline:single;}a:visited, span.MsoHyperlinkFollowed {mso-style-noshow:yes; mso-style-priority:99; color:#954F72; mso-themecolor:followedhyperlink; text-decoration:underline; text-underline:single;}.MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:"Calibri",sans-serif; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:Calibri; mso-fareast-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Calibri; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;}.MsoPapDefault {mso-style-type:export-only; mso-hyphenate:none;}div.WordSection1 {page:WordSection1;}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
##### CD40 inhibiton in AMI on d5, seq on d7 and d14
# Load necessary libraries for data manipulation, analysis, and visualization
library(dplyr)
library(Seurat)
library(patchwork)
library(plyr)
# Set the working directory to the folder containing the data
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/938-2_cellranger_count/outs")
# Read the M0 dataset from the 10X Genomics format
pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
RNA <- pbmc.data$`Gene Expression`
ADT <- pbmc.data$`Antibody Capture`
HST <- pbmc.data$`Multiplexing Capture`
# Load the Matrix package
library(Matrix)
# Hashtag 1, 2 and 3 are marking the mouse replicates per condition
# Subset the rows based on row names
subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")
animals_data <- HST[subsetted_rows, , drop = FALSE]
# Hashtag 4, 5, 6, 7 are representing DMSO d7, TS d7, DMSO d14 and TS d14
subsetted_rows <- c(""TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")
treatment_data <- HST[subsetted_rows, , drop = FALSE]
#Create a Seurat obeject and more assays to combine later
RNA <- CreateSeuratObject(counts = RNA)
ADT <- CreateAssayObject(counts = ADT)
Mice <- CreateAssayObject(counts = animals_data)
Treatment <- CreateAssayObject(counts = treatment_data)
seurat <- RNA
#Add the Assays
seurat[["ADT"]] <- ADT
seurat[["HST_Mice"]] <- Mice
seurat[["HST_Treatment"]] <- Treatment
#Check for AK Names
rownames(seurat[["ADT"]])
#Cluster cells on the basis of their scRNA-seq profiles
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
FeaturePlot(seurat, features = "Col1a1", order = T)
# Normalize ADT data,
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
#Demultiplex cells based on Mouse_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.60)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Mice_classification.global)
DimPlot(seurat, group.by = "HST_Mice_classification")
#Demultiplex cells based on Treatment_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.60)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Treatment_classification.global)
DimPlot(seurat, group.by = "HST_Treatment_classification")
Idents(seurat) <- seurat$HST_Treatment_classification
pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)
Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification
pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
#Redo the clssification to remove the doublettes
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)
table(pbmc.singlet$HST_Treatment_classification.global)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)
table(pbmc.singlet$HST_Mice_classification.global)
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")
seurat <- pbmc.singlet
seurat$mice <- seurat$HST_Mice_maxID
seurat$treatment <- seurat$HST_Treatment_maxID
library(plyr)
seurat$treatment <- revalue(seurat$treatment, c(
"TotalSeq-B0304" = "DMSO_d7",
"TotalSeq-B0305" = "TS_d7",
"TotalSeq-B0306" = "DMSO_d14",
"TotalSeq-B0307" = "TS_d14"
))
library(plyr)
seurat$mice <- revalue(seurat$mice, c(
"TotalSeq-B0301" = "1",
"TotalSeq-B0302" = "2",
"TotalSeq-B0303" = "3"
))
#Cluster cells on the basis of their scRNA-seq profiles without doublettes
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/Analyse")
saveRDS(seurat, file= "TSd5.v0.1.RDS")
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The clustering of cells has been widely used to explore the heterogeneity of cell populations in single-cell RNA-sequencing (scRNA-seq). We proposed a parametric model for monoclonal and polyclonal scRNA-seq data to evaluate clustering results. Based on the parametric model, we proposed a metric (CDI) to quantify the goodness-of-fit of cell clustering to the data. Here we presented CT26.WT and T-CELL as two datasets to examine the performance of our model and metric. CT26.WT contains wild-type CT26 cells from the murine colorectal carcinoma cell line, and cells in CT26.WT are highly homogeneous. T-CELL contains T-cells from tumor tissue of mice three weeks after 4T1 tumor injection. From these datasets and public datasets, we validated our model and benchmarked our metric.
Methods This dataset contains six files. Four of them (matrix.mtx, features.tsv, barcodes.tsv, CT26_bulk_30k.txt) are for CT26.WT, and the other two are for T-CELL. CT26.WT sample preparation: Murine colorectal carcinoma cell line CT26.WT was obtained from the cell culture facility of Duke University and cultured in DMEM media (Sigma Aldrich). All cells were cultured at 37 degrees. Single-cell clones were chosen and cultured for over 220 days. Bulk RNA-seq and single-cell RNA-seq samples were prepared on the same day. CT26.WT bulk RNA-seq: Total RNA from ~ 1,000,000 cells from each group was extracted using the miniprep kit (Zymo Research) according to the manufacturer’s instructions. Then, the libraries were sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) (HiSeq × Ten) with paired-end 150 bp (PE 150) sequencing strategy. CT26.WT scRNA-seq: A total of ~ 10000 cells of each clone were selected for single-cell RNA-seq. Single-cell RNA sequence libraries using Chromium Single Cell 3’ Reagent kits v3 (10x genomics). The libraries were then sequenced on the Illumina sequencing platform by the Novogene Corporation Inc. (CA, USA) with PE 150 sequencing strategy in a single index mode. T-CELL scRNA-seq: In this study, tumors were firstly collected from the female mice after 3 weeks since the mice were injected by 4T1 tumors. Tissues were then disassociated into single cells and homogenized. T cells were separated out by flow sorting with a stringent gating threshold and sequenced on the 10X platform. T-CELL filtering: We filtered out genes with less than 2% non-zero cells and removed cells with less than 2% non-zero genes. Eventually, 2, 989 cells from five cell types with 7, 893 genes were retained. T-CELL annotation: The benchmark clustering labels of the T-CELL population were generated as a combination of protein-marker-based flow sorting labels and bioinformatics labels from Seurat v2. For evaluation purposes, we selected 5 distinct cell types: Regulatory Trm cells, Classical CD4 Tem cells, CD8 Trm cells, CD8 Tcm cells, and Active EM-like Treg cells.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and Code accompanying the new release of RCA, RCA2. The R-package for RCA2 is available at GitHub: https://github.com/prabhakarlab/RCAv2/
The datasets included here are:
Datasets required for a characterization of batch effects:
merged_rna_seurat.rds
de_list.rds
mergedRCAObj.rds
merged_rna_integrated.rds
10X_PBMCs.RDS: Processed 10X PBMC data RCA2 object (10X PBMC example data sets )
NBM_RDS_Files.zip: Several RDS files containing RCA2 object of Normal Bone Marrow (NBM) data, umap coordinates, doublet finder results and metadata information (Normal Bone Marrow use case)
Dataset used for the Covid19 example:
blish_covid.seu.rds
rownames_of_glocal_projection_immune_cells.txt
Blish_RCA_no_QC_filtering_project_to_multiple_panels.rds
Data sets used to outline the ability of supervised clustering to detect disease states:
809653.seurat.rds
blish_covid.seu.rds
Performance benchmarking results:
Memory_consumption.txt
rca_time_list.rds
ScanPY input files:
input_data.zip
The R script provides R code to regenerate the main paper Figures 2 to 7 modulo some visual modifications performed in Inkscape.
Provided R scripts are:
ComputePairWiseDE_v2.R (Required code for pairwise DE computation)
RCA_Figure_Reproduction.R
Provided python Code for Scanpy analysis:
RA_Scanpy.ipynb
CITESeq_Scanpy.ipynb
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
##### CD40 activation and the effect on Neutrophils
# Load necessary libraries for data manipulation, analysis, and visualization
library(dplyr)
library(Seurat)
library(patchwork)
library(plyr)
# Set the working directory to the folder containing the data
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/938-1_cellranger_count/outs")
# Read the M0 dataset from the 10X Genomics format
pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
RNA <- pbmc.data$`Gene Expression`
ADT <- pbmc.data$`Antibody Capture`
HST <- pbmc.data$`Multiplexing Capture`
# Load the Matrix package
library(Matrix)
# Hashtag 1, 2 and 3 are marking the organs (heart, blood, spleen)
# Subset the rows based on row names
subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")
animals_data <- HST[subsetted_rows, , drop = FALSE]
# Hashtag 4, 5, 6, 7 are representing IgG_1, IgG_1, FGK45_1 and FGK45_1
subsetted_rows <- c("TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")
treatment_data <- HST[subsetted_rows, , drop = FALSE]
#Create a Seurat obeject and more assays to combine later
RNA <- CreateSeuratObject(counts = RNA)
ADT <- CreateAssayObject(counts = ADT)
Organ <- CreateAssayObject(counts = animals_data)
Treatment <- CreateAssayObject(counts = treatment_data)
seurat <- RNA
#Add the Assays
seurat[["ADT"]] <- ADT
seurat[["HST_Mice"]] <- Organ
seurat[["HST_Treatment"]] <- Treatment
#Check for AK Names
rownames(seurat[["ADT"]])
#Cluster cells on the basis of their scRNA-seq profiles
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
FeaturePlot(seurat, features = "S100a9", order = T)
# Normalize ADT data,
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
#Demultiplex cells based on Mouse_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.99)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Mice_classification.global)
DimPlot(seurat, group.by = "HST_Mice_classification")
#Demultiplex cells based on Treatment_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.99)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Treatment_classification.global)
DimPlot(seurat, group.by = "HST_Treatment_classification")
Idents(seurat) <- seurat$HST_Treatment_classification
pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)
Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification
pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
#Redo the clssification to remove the doublettes
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)
table(pbmc.singlet$HST_Treatment_classification.global)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)
table(pbmc.singlet$HST_Mice_classification.global)
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")
seurat <- pbmc.singlet
seurat$organ <- seurat$HST_Mice_maxID
seurat$mouse <- seurat$HST_Treatment_maxID
seurat$treatment <- seurat$HST_Treatment_maxID
library(plyr)
seurat$treatment <- revalue(seurat$treatment, c(
"TotalSeq-B0304" = "IgG",
"TotalSeq-B0305" = "IgG",
"TotalSeq-B0306" = "FGK45",
"TotalSeq-B0307" = "FGK45"
))
library(plyr)
seurat$organ <- revalue(seurat$organ, c(
"TotalSeq-B0301" = "heart",
"TotalSeq-B0302" = "blood",
"TotalSeq-B0303" = "spleen"
))
seurat$mouse <- revalue(seurat$mouse, c(
"TotalSeq-B0304" = "1",
"TotalSeq-B0305" = "2",
"TotalSeq-B0306" = "3",
"TotalSeq-B0307" = "4"
))
#Cluster cells on the basis of their scRNA-seq profiles without doublettes
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/Analyse")
saveRDS(seurat, file = "FGK45_heart_blood_spleen.v0.1.RDS")
Serialized R data files (.rds) associated with the inDrop single-cell RNA-seq analysis in Huang et al., 2019. Each file has a single Seurat object containing a subset of clusters from the full processed dataset, which were separated into different objects due to file size limitations. Raw data (UMIFM counts) are included in the corresponding slot in each Seurat object. Seurat objects can be re-merged into a single object containing the full dataset using the MergeSeurat function.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reanalysis of Tran et al., 2019 ipRGC and ipRGC-proximal clusters (original annotations C33_M1, C40_M1dup, C31_M2, C43_M4, C22_M5, C7, and C8) used in Dyer et al., 2014. Dataset includes integrated Seurat object of aforementioned clusters and csv files with all differentially expressed genes / top 10 differentially expressed genes.
This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation.
A summary of the content is provided in the following.
R scripts
Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R
Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R
Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R
Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R
RDS files
General scRNA-seq Seurat objects:
SCENIC files:
BCR-ABL1 inference:
NK sub-clustering and filtering:
txt and csv files:
Compressed folders:
For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there.
RAW data is available at EGA upon request using Study ID: EGAS00001005509
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated discrete clusters.
FILES: Data processing code
adapted_discrete_clusters_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form discrete clusters . generate_test_data_discrete_clusters_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_discrete_clusters_sim_milo_paper.R. Seurat objects saved as cells_sim_discerete_clusters_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_discrete_clusters.csv.
Resulting datasets
cells_sim_discerete_clusters_gex_seed_*.rds Seurat objects generated by generate_test_data_discrete_clusters_sim_milo_paper.R. benchmark_dataset_sim_discrete_clusters.csv Cell labels generated by generate_test_data_discrete_clusters_sim_milo_paper.R.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.