86 datasets found

Data, R code and output Seurat Objects for single cell RNA-seq analysis of...
figshare.com
application/gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17058077.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Yunshun Chen; Gordon Smyth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.
d
Data from: Large-scale integration of single-cell transcriptomic data...
dataone.org
data.niaid.nih.gov
+1more
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2025). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
May 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
Time period covered
Oct 22, 2021
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, in...
n
Transcription start site analysis for heterogenous CD4+ T cells using 5′...
data.niaid.nih.gov
datadryad.org
zip
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akiko Oguchi; Yasuhiro Murakawa (2024). Transcription start site analysis for heterogenous CD4+ T cells using 5′ scRNA-seq [Dataset]. http://doi.org/10.5061/dryad.gtht76hv9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gtht76hv9
Dataset updated
Apr 22, 2024
Dataset provided by
RIKEN Center for Integrative Medical Sciences
Authors
Akiko Oguchi; Yasuhiro Murakawa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
These datasets are generated by ReapTEC (read-level pre-filtering and transcribed enhancer call) using 5' single-cell RNA-seq data on human heterogenous CD4+ T cells. By taking advantage of a unique “cap signature” derived from the 5′-end of a transcript, ReapTEC simultaneously profiles gene expression and enhancer activity at nucleotide resolution using 5′-end single-cell RNA-sequencing (5′ scRNA-seq). The detail of ReapTEC pipeline is described in https://github.com/MurakawaLab/ReapTEC.
Scripts for Analysis
figshare.com
txt
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6783569.v2
Dataset updated
Jul 18, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sneddon Lab UCSF
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
Data used in SeuratIntegrate paper
zenodo.org
application/gzip, bin +2
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin (2025). Data used in SeuratIntegrate paper [Dataset]. http://doi.org/10.5281/zenodo.15496601
Explore at:
bin, pdf, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15496601
Dataset updated
May 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository gathers the data and code used to generate hepatocellular carcinoma analyses in the paper presenting SeuratIntegrate. It contains the scripts to reproduce the figures presented in the article. Some figures are also available as pdf files.

To be able to fully reproduce the results from the paper, one shoud:

download all the files

install R 4.3.3, with correspondig base R packages (stats, graphics, grDevices, utils, datasets, methods and base)

install R packages listed in the file sessionInfo.txt

install the provided version of SeuratIntegrate. In an R session, run:

remotes::install_local("path/to/SeuratIntegrate_0.4.1.tar.gz")

install (mini)conda if necessary (we used miniconda version 23.11.0)

install the conda environments (if it fails with the *package-list.yml files, use the *package-list-from-history.yml files instead):

conda env create --file SeuratIntegrate_bbknn_package-list.yml conda env create --file SeuratIntegrate_scanorama_package-list.yml conda env create --file SeuratIntegrate_scvi-tools_package-list.yml conda env create --file SeuratIntegrate_trvae_package-list.yml

open an R session to make the conda environments usable by SeuratIntegrate:

library(SeuratIntegrate) UpdateEnvCache("bbknn", conda.env = "SeuratIntegrate_bbknn", conda.env.is.path = FALSE) UpdateEnvCache("scanorama", conda.env = "SeuratIntegrate_scanorama", conda.env.is.path = FALSE) UpdateEnvCache("scvi", conda.env = "SeuratIntegrate_scvi-tools", conda.env.is.path = FALSE) UpdateEnvCache("trvae", conda.env = "SeuratIntegrate_trvae", conda.env.is.path = FALSE)

Once done, running the code in integrate.R should produce reproducible results. Note that lines 3 to 6 from integrate.R should be adapted to the user's setup.
integrate.R is subdivided into six main parts:

Preparation: lines 1-56

Preprocessing: lines 58-74

Integration: lines 76-121

Processing of integration outputs: lines 126-267

Scoring of integration outputs: lines 269-353

Plotting: lines 380-507

Intermediate SeuratObjects have been saved between steps 3 and 4 and 5 and 6 (liver10k_integrated_object.RDS and liver10k_integrated_scored_object.RDS respectively). It is possible to start with these intermediate SeuratObjects to avoid the preceding steps, given that the Preparation step is always run before.

Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to...

zenodo.org

bin, csv, zip

Updated Oct 24, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang (2024). Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to anti-PD-1 and anti-PD-1/CTLA-4 immunotherapy in melanoma [Dataset]. http://doi.org/10.5281/zenodo.13971562

Explore at:

bin, csv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13971562

Dataset updated

Oct 24, 2024

Dataset provided by

Zenodo

Authors

Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset details the scRNASeq and TCR-Seq analysis of sorted PD-1+ CD8+ T cells from patients with melanoma treated with checkpoint therapy (anti-PD-1 monotherapy and anti-PD-1 & anti-CTLA-4 combination therapy) at baseline and after the first cycle of therapy. A major publication using this dataset is accessible here: (reference)

*experimental design

Single-cell RNA sequencing was performed using 10x Genomics with feature barcoding technology to multiplex cell samples from different patients undergoing mono or dual therapy so that they can be loaded on one well to reduce costs and minimize technical variability. Hashtag oligomers (oligos) were obtained as purified and already oligo-conjugated in TotalSeq-C format from BioLegend. Cells were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

*extract protocol

PBMCs were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions.

*library construction protocol

Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

*library strategy

scRNA-seq and scTCR-seq

*data processing step

Pre-processing of sequencing results to generate count matrices (gene expression and HTO barcode counts) was performed using the 10x genomics Cell Ranger pipeline.

Further processing was done with Seurat (cell and gene filtering, hashtag identification, clustering, differential gene expression analysis based on gene expression).

*genome build/assembly

Alignment was performed using prebuilt Cell Ranger human reference GRCh38.

*processed data files format and content

RNA counts and HTO counts are in sparse matrix format and TCR clonotypes are in csv format.

Datasets were merged and analyzed by Seurat and the analyzed objects are in rds format.

file name	file checksum
PD1CD8_160421_filtered_feature_bc_matrix.zip	da2e006d2b39485fd8cf8701742c6d77
PD1CD8_190421_filtered_feature_bc_matrix.zip	e125fc5031899bba71e1171888d78205
PD1CD8_160421_filtered_contig_annotations.csv	927241805d507204fbe9ef7045d0ccf4
PD1CD8_190421_filtered_contig_annotations.csv	8ca544d27f06e66592b567d3ab86551e

*processed data file	antibodies/tags
PD1CD8_160421_filtered_feature_bc_matrix.zip	none
PD1CD8_160421_filtered_feature_bc_matrix.zip	TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M1_base_monotherapy TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M1_post_monotherapy TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - C1_base_combined_therapy TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - C1_post_combined_therapy TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C2_base_combined_therapy TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C2_post_combined_therapy
PD1CD8_160421_filtered_contig_annotations.csv	none
PD1CD8_190421_filtered_feature_bc_matrix.zip	none
PD1CD8_190421_filtered_feature_bc_matrix.zip	TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M2_base_monotherapy TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M2_post_monotherapy TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - M3_base_monotherapy TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - M3_post_monotherapy TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C3_base_combined_therapy TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C3_post_combined_therapy
PD1CD8_190421_filtered_contig_annotations.csv	none

l
cellCounts
opal.latrobe.edu.au
researchdata.edu.au
bin
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/21588276.v3
Dataset updated
Dec 19, 2022
Dataset provided by
La Trobe
Authors
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
f
ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1
figshare.com
application/gzip
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1 [Dataset]. http://doi.org/10.6084/m9.figshare.12478571.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12478571.v2
Dataset updated
Jun 29, 2023
Dataset provided by
figshare
Authors
Massimo Andreatta; Santiago Carmona
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).
Data from: Single cell multiomic analysis identifies key genes...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jul 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinav Kaushik; Kari Nadeau (2024). Single cell multiomic analysis identifies key genes differentially expressed in innate lymphoid cells from COVID-19 patients [Dataset]. http://doi.org/10.5061/dryad.8931zcrz4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8931zcrz4
Dataset updated
Jul 2, 2024
Dataset provided by
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
Authors
Abhinav Kaushik; Kari Nadeau
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Innate lymphoid cells (ILCs) are enriched at mucosal surfaces where they respond rapidly to environmental stimuli and contribute to both tissue inflammation and healing. To gain insight into the role of ILCs in the pathology and recovery from COVID-19 infection, we employed a multi-omic approach consisting of Abseq and targeted mRNA sequencing to respectively probe the surface marker expression, transcriptional profile and heterogeneity of ILCs in peripheral blood of patients with COVID-19 compared with healthy controls. We found that the frequency of ILC1 and ILC2 cells was significantly increased in COVID-19 patients. Moreover, all ILC subsets displayed a significantly higher frequency of CD69-expressing cells, indicating a heightened state of activation. ILC2s from COVID-19 patients had the highest number of significantly differentially expressed (DE) genes. The most notable genes DE in COVID-19 vs healthy participants included a) genes associated with responses to virus infections and b) genes that support ILC self-proliferation, activation and homeostasis. In addition, differential gene regulatory network analysis revealed ILC-specific regulons and their interactions driving the differential gene expression in each ILC. Overall, this study provides mechanistic insights into the characteristics of ILC subsets activated during COVID-19 infection. Methods Study participants, blood draws and processing Participants were recruited as described previously from adults who had a positive SARS-COV-2 RT-PCR test at Stanford Health Care (NCT04373148). Collection of Covid samples occurred between May to December 2020. The cohort used in this study consisted of asymptomatic (n=2), mild (n=17), and moderate (n=3) COVID-19 infections, some of whom developed long term COVID-19 (n=15). The clinical case severities at the time of diagnosis were defined as asymptomatic, moderate or mild according to the guidelines released by NIH. Long term (LT) COVID was defined as symptoms occurring 30 or more days after infection, consistent with CDC guidelines. Some participants in our study continued to have LT COVID symptoms 90 days after diagnosis (n=12). Exclusion criteria for COVID sample study were NIH severity diagnosis of severe or critical at the time of positive covid test. Samples selected for this study were obtained within 76 days of positive PCR COVID-19 test date. Healthy controls were selected who had sample collection before 2020. Informed consent was obtained from all participants. All protocols were approved by the Stanford Administrative Panel on Human Subjects in Medical Research. Peripheral blood was drawn by venipuncture and using validated and published procedures, peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-based density gradient centrifugation, frozen in aliquots and stored in liquid nitrogen at -80°C , until thawing. A summary of participant demographics is presented in Supp. Table 1.
ILC Enrichment, single cell captures for Abseq and targeted mRNAseq Participant PBMCs were thawed, and each sample stained with Sample Tag (BD #633781) at room temperature for 20 minutes. Samples were combined in healthy control or COVID-19 tubes. Cells were surface stained with a panel of fluorochrome-conjugated antibodies (Supp. Table 2) in buffer (PBS with 0.25% BSA and 1mM EDTA) for 20 minutes at room temperature prior to immunomagnetic negative selection for ILCs. Following ILC enrichment using the EasySep human Pan-ILC enrichment kit (StemCell Technologies #17975), cells from healthy and COVID-19 recovered participants were counted and normalized before combining. ILCs were sorted using a BD FACS Aria at the Stanford FACS facility prior to incubation with AbSeq oligo-linked mAbs (Supp. Table 3). Sorted cells were processed by the Stanford Human Immune Monitoring Center (HIMC) using the BD Rhapsody platform. Library was prepared using the BD Immune Response Targeting Panel (BD Kit #633750) with addition of custom gene panel reagents (Supp. Table 4) and sequenced on Illumina NovaSeq 6000 at Stanford Genomics Sequencing Center (SGSC). ILCs were identified as Lineageneg (CD3neg, CD14neg, CD34neg, CD19neg), NKG2Aneg, CD45+ and ILCs further defined as CD127+CD161+ and as subsets: ILC1 (CD117negCRTH2neg), ILC2 (CRTH2+) and ILCp (CD117+CRTH2neg) (Supp. Fig. 1). Computational data analysis The above multi-modal setup allowed paired measurements of cellular transcriptome and cell surface protein abundance. The ILC1, ILC2 and ILCp cells were manually gated based on the abundance profile of CD127, CD117, CD161 and CRTH2 (Supp. Fig. 1). Before the integrative analysis, the complete multi-modal single cell dataset containing ILC subsets was converted into single Seurat object. All the subsequent protein-level and gene-level analyses were performed using multimodal data analysis pipeline of Seurat R package version 4.0. The normalized and scaled protein abundance profile was used for estimating the integrated harmony dimensions using runHarmony function in Seurat R package (reduction= ‘apca’ and group.by.vars = ‘batch’) . The batch corrected harmony embeddings were then used for computing the Uniform Manifold Approximation and Projection (UMAP) dimensions to visualize the clusters of ILC subsets. Differential marker analysis of surface proteins, between two groups of cells (COVID-19 and Healthy cohort), from abseq panels was computed with normalized and scaled expression values using FindMarkers function from Seurat R package (test.use=’wilcox’). Similarly, differential gene expression was performed on normalized and scaled gene expression values from between two groups of cells (COVID-19 and Healthy cohort) using the FindMarkers function from Seurat R package (test.use=’MAST’ and latent.vars=’batch’). Genes with log-fold change > 0.5 and adjusted p-value < 0.05 (method: Benjamini-Hochberg) (were considered as significant for further evaluation. The resulting adjusted p-values box-plots were plotted using ggplot2 R package (version 3.4.2) after computing the number of cells expressing a given protein or gene in each sample. Pathway enrichment analysis of DE genes was performed using web-server metascape (version 3.5). The AUCells score and gene regulatory network analysis was performed using pySCENIC pipeline (version 0.12.1). Gene regulatory network was reconstructed using GRNBoost2 algorithm and the list of TFs in humans (genome version: hg38) were obtained from cisTarget database. (https://resources.aertslab.org/cistarget). Cellular enrichment (aka AUCell) analysis that measures the activity of TF or gene signatures across all single cells was performed using aucell function in pySCENIC python library. The ggplot2 R package (version 3.4.2) was used for boxplot visualization. The differential gene co-expression analysis was performed using scSFMnet R package. Circular plots were generated using the R package circlize (version 0.4.15).
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. http://doi.org/10.5281/zenodo.10011622
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10011622
Dataset updated
Nov 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonathan Hsu; Allart Stoop; Jonathan Hsu; Allart Stoop
Description
Table of Contents
Main Description
File Descriptions
Linked Files
Installation and Instructions

1. Main Description
---------------------------
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled `marengo_code_for_paper_jan_2023.R` was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat
scReportoire
ggplot2
stringr
dplyr
ggridges
ggrepel
ComplexHeatmap

File Descriptions
---------------------------
The code can be downloaded and opened in RStudios.
The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper
The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113).
The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots.
The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files
---------------------

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the **raw sequencing** or `.fastq.gz` files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.
Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code.
Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions
--------------------------------------
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

> Ensure you have R version 4.1.2 or higher for compatibility.

> Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
3. Set your working directory to where the following files are located:
marengo_code_for_paper_jan_2023.R
Install_Packages.R
Marengo_newID_March242023.rds
genes_for_heatmap_fig5F.xlsx
all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:
> setwd(directory)

4. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
5. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
6. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
7. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...
figshare.com
application/gzip
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihao Qu; Peggy Myung (2024). Processed Seurat objects for GeneTrajectory inference (Gene Trajectory Inference for Single-cell Data by Optimal Transport Metrics) [Dataset]. http://doi.org/10.6084/m9.figshare.25243225.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25243225.v1
Dataset updated
Feb 19, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rihao Qu; Peggy Myung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are processed Seurat objects for the two biological datasets in GeneTrajectory inference (https://github.com/KlugerLab/GeneTrajectory/):Human myeloid dataset analysisMyeloid cells were extracted from a publicly available 10x scRNA-seq dataset (https:// support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc 10k v3). QC was performed using the same workflow in (https://github.com/satijalab/ Integration2019/blob/master/preprocessing scripts/pbmc 10k v3.R). After standard normalization, highly-variable gene selection and scaling using the Seurat R package, we applied PCA and retained the top 30 principal components. Four sub-clusters of myeloid cells were identified based on Louvian clustering with a resolution of 0.3. Wilcoxon rank-sum test was employed to find cluster-specific gene markers for cell type annotation.For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel, each bandwidth is determined by the distance to its k-nearest neighbor, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 5 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 0.5% − 75% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (11,21,8) to extract three gene trajectories. Mouse embryo skin data analysisWe separated out dermal cell populations from the newly collected mouse embryo skin samples. Cells from the wildtype and the Wls mutant were pooled for analyses. After standard normalization, highly-variable gene selection and scaling using Seurat, we applied PCA and retained the top 30 principal components. Three dermal celltypes were stratified based on the expression of canonical dermal markers, including Sox2, Dkk1, and Dkk2. For gene trajectory inference, we first applied Diffusion Map on the cell PC embedding (using a local-adaptive kernel bandwidth, k = 10) to generate a spectral embedding of cells. We constructed a cell-cell kNN (k = 10) graph based on their coordinates of the top 10 non-trivial Diffusion Map eigenvectors. Among the top 2,000 variable genes, genes expressed by 1% − 50% of cells were retained for pairwise gene-gene Wasserstein distance computation. The original cell graph was coarse-grained into a graph of size 1,000. We then built a gene-gene graph where the affinity between genes is transformed from the Wasserstein distance using a Gaussian kernel (local-adaptive, k = 5). Diffusion Map was employed to visualize the embedding of gene graph. For trajectory identification, we used a series of time steps (9,16,5) to sequentially extract three gene trajectories. To compare the differences between the wiltype and the Wls mutant, we stratified Wnt-active UD cells into seven stages according to their expression profiles of the genes binned along the DC gene trajectory.
o
Repository for the single cell RNA sequencing data analysis for the human...
explore.openaire.eu
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8286134
Dataset updated
Aug 26, 2023
Authors
Jonathan; Andrew; Pierre; Allart; Adrian
Description
This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.
Data for Cell-type-specific alternative splicing in the cerebral cortex of a...
zenodo.org
explore.openaire.eu
application/gzip
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emma F. Jones; Emma F. Jones; Timothy C. Howton; Timothy C. Howton; Tabea M. Soelter; Tabea M. Soelter; Anthony B. Crumley; Anthony B. Crumley; Brittany N. Lasseigne; Brittany N. Lasseigne (2024). Data for Cell-type-specific alternative splicing in the cerebral cortex of a Schinzel-Giedion Syndrome patient variant mouse model [Dataset]. http://doi.org/10.5281/zenodo.12535061
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12535061
Dataset updated
Aug 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emma F. Jones; Emma F. Jones; Timothy C. Howton; Timothy C. Howton; Tabea M. Soelter; Tabea M. Soelter; Anthony B. Crumley; Anthony B. Crumley; Brittany N. Lasseigne; Brittany N. Lasseigne
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
data.tar.gz contains all files from the data directory (except for sam outputs from STAR) associated with the 230926_EJ_Setbp1_AlternativeSplicing GitHub project and includes the following files:

./marvel: - This directory contains rds and Rdata objects that were created using the MARVEL R package

cell_type_goresults.rds - This is the go results split by cell type

marvel_04_split_counts.Rdata - This R data includes all environment objects from MARVEL script 04, and is used for downstream plotting

normalized_sj_expression.Rds - This object is the normalized splice junction expression

Setbp1_marvel_aligned.rds - Final prepared MARVEL object before any SJU analyses have been run

significant_tables.RData - For those who do not want to load multiple massive files, this includes all significant SJU results for each cell type

sj_usage_cell_type.rds - This data object has splice junction usage calculated for each cell type

sj_usage_condition.rds - This data object has splice junction usage calculated for each cell type and also split by condition

./seurat: - This directory contains all intermediate and final Seurat single-cell gene expression objects

annotated_brain_samples.rds - This is the final iteration of the processing in Seurat for a final annotated object. Please use this object for any Seurat or single-cell gene expression analyses.

clustered_brain_samples.rds - This is the clustered Seurat object, before cell type annotation based on canonical markers.

filtered_brain_samples_pca.rds - This is the filtered Seurat object, before clustering but after PCA.

filtered_brain_samples.rds - This is the filtered Seurat object, before PCA.

integrated_brain_samples.rds - This the integrated Seurat object, before other steps.

./star: - All files in the STAR directory are outputs from STARsolo, as described in our methods. Each output directory contains the same files, so only one example is included here for brevity. Intermediate SAM files were removed to optimize space.

J1/ - This directory contains outputs for brain sample J1

J13/ - This directory contains outputs for brain sample J13

J15/ - This directory contains outputs for brain sample J15

J2/ - This directory contains outputs for brain sample J2

J3/ - This directory contains outputs for brain sample J3

J4/ - This directory contains outputs for brain sample J4

K1/ - This directory contains outputs for kidney sample K1

K2/ - This directory contains outputs for kidney sample K2

K3/ - This directory contains outputs for kidney sample K3

K4/ - This directory contains outputs for kidney sample K4

K5/ - This directory contains outputs for kidney sample K5

K6/ - This directory contains outputs for kidney sample K6

./star/genome: - This directory contains outputs from running STAR genomeGenerate. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

chrLength.txt

chrNameLength.txt

chrName.txt

chrStart.txt

exonGeTrInfo.tab

exonInfo.tab

geneInfo.tab

Genome

genomeParameters.txt

Log.out

SA

SAindex

sjdbInfo.txt

sjdbList.fromGTF.out.tab

sjdbList.out.tab

transcriptInfo.tab

./star/J1: - This is the head STAR directory for sample J1. It contains logs, basic QC, and gene and splice junction counts. For more information about the STAR pipeline and its outputs, please refer to the STAR documentation https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Log.final.out

Log.out

Log.progress.out

SJ.out.tab

Solo.out/

STARgenome/

./star/J1/Solo.out:- This directory contains the outputs used for downstream analysis

Barcodes.stats

GeneFull_Ex50pAS/

SJ/

./star/J1/Solo.out/GeneFull_Ex50pAS: - This directory contains the filtered and raw barcodes, features, and matrix files for gene expression (including introns)

Features.stats

filtered/

raw/

Summary.csv

UMIperCellSorted.txt

./star/J1/Solo.out/GeneFull_Ex50pAS/filtered: - This directory contains the filtered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages)

barcodes.tsv.gz - This file contains filtered cell barcodes

features.tsv.gz - This file contains filtered features (genes)

matrix.mtx.gz - This file contains the filtered cell by gene expression count matrix

./star/J1/Solo.out/GeneFull_Ex50pAS/raw: - This directory contains the unfiltered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages). Files are the same as previously described for filtered.

barcodes.tsv

features.tsv

matrix.mtx

./star/J1/Solo.out/SJ: - This directory contains the QC and raw barcodes, features, and matrix files for splice junction expression

Features.stats

raw/

Summary.csv

./star/J1/Solo.out/SJ/raw: - This directory contains the raw barcodes, features, and matrix files for splice junction expression

barcodes.tsv - This file contains filtered cell barcodes

features.tsv - This file contains filtered features (splice junctions)

matrix.mtx - This file contains the filtered cell by gene expression count matrix

./star/J1/_STARgenome: - This directory contains the STARgenome created and used by STAR for this sample. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

exonGeTrInfo.tab

exonInfo.tab

geneInfo.tab

sjdbInfo.txt

sjdbList.fromGTF.out.tab

sjdbList.out.tab

transcriptInfo.tab
f
Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders (2023). Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.644211.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.644211.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.
Data from: Systematic reconstruction of molecular pathway signatures using...
zenodo.org
bin, pdf, txt, zip
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Longda Jiang; Longda Jiang; Carol Dalgarno; Carol Dalgarno; Efthymia Papalexi; Efthymia Papalexi; Isabella Mascio; Isabella Mascio; Hans-Hermann Wessels; Hans-Hermann Wessels; Huiyoung Yun; Huiyoung Yun; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson; Rahul Satija; Rahul Satija; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson (2025). Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens [Dataset]. http://doi.org/10.5281/zenodo.14518762
Explore at:
pdf, bin, zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14518762
Dataset updated
Feb 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Longda Jiang; Longda Jiang; Carol Dalgarno; Carol Dalgarno; Efthymia Papalexi; Efthymia Papalexi; Isabella Mascio; Isabella Mascio; Hans-Hermann Wessels; Hans-Hermann Wessels; Huiyoung Yun; Huiyoung Yun; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson; Rahul Satija; Rahul Satija; Nika Iremadze; Gila Lithwick-Yanai; Doron Lipson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repo contains Seurat objects, differential expression analysis results, and pathway gene lists for the manuscript "Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens"
List of files:

1. Seurat_object_IFNB_Perturb_seq.rds: Seurat object of the Perturb-seq data for Interferon-beta pathway
2. Seurat_object_IFNG_Perturb_seq.rds: Seurat object of the Perturb-seq data for Interferon-gamma pathway
3. Seurat_object_TNFA_Perturb_seq.rds: Seurat object of the Perturb-seq data for TNF-alpha pathway
4. Seurat_object_TGFB1_Perturb_seq.rds: Seurat object of the Perturb-seq data for TGF-beta1 pathway
5. Seurat_object_INS_Perturb_seq.rds: Seurat object of the Perturb-seq data for insulin pathway
6. Pathway_genelist.rds: The pathway gene lists from MultiCCA analysis
7. Pathway_Exclusive_genelist.rds: The pathway exclusive gene lists generated from Pathway_genelist.rds
8. HClust_Pathway_celltype_specific_genelist.rds: The cell-line specific pathway gene lists from hierarchical clustering analysis independently done on each cell line
9. DE_results_all_pathway.zip: The DE test results for all the regulators, cell lines, and pathways (from Mixscale weighted DE test.)
10. Bulk_RNAseq_Seurat_object_IFNG_and_TGFB_stim.rds: Seurat object for the bulk RNA-seq data for interferon-gamma and TGF-beta stimulation experiments
11. Parse_Guide_Capture_Protocol.pdf: The guide RNA capture protocol developed for Parse Evercode Whole Transcriptome kit
n
Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jun 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow (2020). Single cell RNA-seq analysis reveals that prenatal arsenic exposure results in long-term, adverse effects on immune gene expression in response to Influenza A infection [Dataset]. http://doi.org/10.5061/dryad.vt4b8gtp6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vt4b8gtp6
Dataset updated
Jun 1, 2020
Dataset provided by
Dartmouth College
Dartmouth–Hitchcock Medical Center
Authors
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.

Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).

Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.

Preprocessing of single cell RNA sequencing (scRNA-seq) data

Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.

Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq

Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.

Differential gene expression by immune cells

Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.

Analysis of arsenic effect on immune cell gene expression by scRNA-seq.

Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
u
Dawnn benchmarking dataset: Heart cells processing and label simulation
rdr.ucl.ac.uk
txt
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Heart cells processing and label simulation [Dataset]. http://doi.org/10.5522/04/22601260.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22601260.v1
Dataset updated
May 4, 2023
Dataset provided by
University College London
Authors
George Hall; Sergi Castellano Hereza
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on single-cell RNAseq of heart cells.

FILES: Input data Dataset from: "Integrated multi-omic characterization of congenital heart disease". Nature 608 pp. 181-191 (2022).

heart_barcodes.tsv.gz Cell barcode list heart_genes.tsv.gz Gene list heart_expression_matrix.mtx.gz Cell-by-gene expression matrix

Data processing code

process_heart_cells.R Generates benchmarking dataset from input data. (Reads heart_barcodes.tsv.gz, heart_genes.tsv.gz, and heart_expression_matrix.mtx.gz; Runs the standard Seurat pipeline; Saves the resulting Seurat dataset as heart_tissue_cells.RDS and the resulting cell labels as benchmark_dataset_heart_data_type_labels.csv)

Resulting datasets

heart_tissue_cells.RDS Seurat dataset generated by process_heart_cells.R. benchmark_dataset_heart_data_type_labels.csv Cell labels generated by process_heart_cells.R.
scRNA-seq data for article: Kupffer cell and recruited macrophage...
zenodo.org
bin
Updated Jan 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Pessenda; Tiago R. Ferreira; Tiago R. Ferreira; Andrea Paun; Eduardo P. Amaral; Juraj Kabat; Olena Kamenyeva; Camila O. S. Souza; Sundar Ganesan; Sang Hun Lee; David. Sacks; Gabriela Pessenda; Andrea Paun; Eduardo P. Amaral; Juraj Kabat; Olena Kamenyeva; Camila O. S. Souza; Sundar Ganesan; Sang Hun Lee; David. Sacks (2025). scRNA-seq data for article: Kupffer cell and recruited macrophage heterogeneity orchestrate granuloma maturation and hepatic immunity in visceral leishmaniasis [Dataset]. http://doi.org/10.5281/zenodo.10780584
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10780584
Dataset updated
Jan 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gabriela Pessenda; Tiago R. Ferreira; Tiago R. Ferreira; Andrea Paun; Eduardo P. Amaral; Juraj Kabat; Olena Kamenyeva; Camila O. S. Souza; Sundar Ganesan; Sang Hun Lee; David. Sacks; Gabriela Pessenda; Andrea Paun; Eduardo P. Amaral; Juraj Kabat; Olena Kamenyeva; Camila O. S. Souza; Sundar Ganesan; Sang Hun Lee; David. Sacks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 2024
Description
Single-cell RNA-seq dataset from sorted CD11bInt, F4/80Hi, CD64+ mouse liver cells in naive or Leishmania infantum-infected animals at 42 d.p.i.. Data analyses and results are described in manuscript: "Kupffer cell and recruited macrophage heterogeneity orchestrate granuloma maturation and hepatic immunity in visceral leishmaniasis". Data files are Seurat objects in RDS format. Filtered-out potential doublets, low quality cells and dying cells (excluded cells with <1000 genes detected, cells with >6000 genes detected, cells with mitochondrial gene expression > 10% and cells with <5000 transcript molecules). Data normalization, scaling and integration performed using Seurat.

Filtered dataset containing all KCs and macrophages is in the "pessenda_KC_Macro_seurat" file.

Our data was then mapped to a reference dataset by Remmerie et al. (DOI: 10.1016/j.immuni.2020.08.004) for annotation consistent with the literature. The reference mapped object can be found in the "pessenda_refmap_KC_Macro_seurat" file.
s
Axi-cel CAR T single-cell data
purl.stanford.edu
Updated Jul 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zinaida Good; Jay Spiegel; Bita Sahaf; Meena Malipatlolla; Zach Ehlinger; Sreevidya Kurra; Moksha Desai; Warren Reynolds; Anita Wong Lin; Panayiotis Vandris; Fang Wu; Snehit Prabhu; Mark Hamilton; John Tamaresis; Paul Hanson; Shabnum Patel; Steven Feldman; Matthew Frank; John Baird; Lori Muffly; Gursharan Claire; Juliana Craig; Katherine Kong; Dhananjay Wagh; John Coller; Sean Bendall; Robert Tibshirani; Sylvia Plevritis; David Miklos; Crystal Mackall (2022). Axi-cel CAR T single-cell data [Dataset]. http://doi.org/10.25740/qb215vz6111
Explore at:
Unique identifier
https://doi.org/10.25740/qb215vz6111
Dataset updated
Jul 1, 2022
Authors
Zinaida Good; Jay Spiegel; Bita Sahaf; Meena Malipatlolla; Zach Ehlinger; Sreevidya Kurra; Moksha Desai; Warren Reynolds; Anita Wong Lin; Panayiotis Vandris; Fang Wu; Snehit Prabhu; Mark Hamilton; John Tamaresis; Paul Hanson; Shabnum Patel; Steven Feldman; Matthew Frank; John Baird; Lori Muffly; Gursharan Claire; Juliana Craig; Katherine Kong; Dhananjay Wagh; John Coller; Sean Bendall; Robert Tibshirani; Sylvia Plevritis; David Miklos; Crystal Mackall
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
This repository contains metadata and single-cell data used to generate figures in the manuscript entitled: "Post-infusion Treg-like CAR T cells identify patients resistant to CD19-CAR therapy". Included here: CSV files containing patient cohort metadata, summary statistics and quantitative PCR results; FCS files for flow and mass cytometry data; processed Seurat object for single-cell sequencing data. Raw single-cell sequencing data, cellranger alignment results, and metadata are available through the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo; GEO accession number: GSE168940). With questions, please reach out to Zinaida Good (zinaida@stanford.edu) or Crystal L. Mackall (cmackall@stanford.edu).
Robject files for tissues processed by Seurat
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tabula Muris Consortium (2023). Robject files for tissues processed by Seurat [Dataset]. http://doi.org/10.6084/m9.figshare.5821263.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5821263.v3
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Tabula Muris Consortium
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each tissue's gene expression profile was processed by experts to annotate clusters of cells with biological functions. These are the Robjects created using Seurat to normalize and cluster the single-cell RNA-seq expression data.Update 2018-03-27: Updated to resubmitted RobjUpdate 2018-09-20: Updated to accepted Robj

Facebook

Twitter

Click to copy link

Link copied

Cite

Yunshun Chen; Gordon Smyth (2023). Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues [Dataset]. http://doi.org/10.6084/m9.figshare.17058077.v1

Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.17058077.v1

Dataset updated

May 31, 2023

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Yunshun Chen; Gordon Smyth

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains all the Seurat objects that were used for generating all the figures in Pal et al. 2021 (https://doi.org/10.15252/embj.2020107333). All the Seurat objects were created under R v3.6.1 using the Seurat package v3.1.1. The detailed information of each object is listed in a table in Chen et al. 2021.

Clear search

Close search

Google apps

Main menu

Data, R code and output Seurat Objects for single cell RNA-seq analysis of...

Data from: Large-scale integration of single-cell transcriptomic data...

Transcription start site analysis for heterogenous CD4+ T cells using 5′...

Scripts for Analysis

Data used in SeuratIntegrate paper

Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to...

cellCounts

ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1

Data from: Single cell multiomic analysis identifies key genes...

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

Processed Seurat objects for GeneTrajectory inference (Gene Trajectory...

Repository for the single cell RNA sequencing data analysis for the human...

Data for Cell-type-specific alternative splicing in the cerebral cortex of a...

Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...

Data from: Systematic reconstruction of molecular pathway signatures using...

Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...

Dawnn benchmarking dataset: Heart cells processing and label simulation

scRNA-seq data for article: Kupffer cell and recruited macrophage...

Axi-cel CAR T single-cell data

Robject files for tissues processed by Seurat

Data, R code and output Seurat Objects for single cell RNA-seq analysis of human breast tissues