Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ProjecTILs human reference atlas of blood and tumor-infiltrating Dendritic Cells (DC), generated by semi-supervised STACAS integration of 11 scRNA-seq datasets spanning 5 cancer types. The datasets were obtained from Gerhard et al. JEM, 2020, as well as Villani et al. Science, 2017, for the blood samples. The dataset is available as a Seurat object, with subtype labels (functional.cluster), sample of origin and several other measurement for each cell stored in the object metadata. The object can be directly used as a reference atlas for ProjecTILs (Andreatta et al. Nat Comms 2021).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Seurat objects containing the raw and normalized data for:Normal bone marrow (NBM) atlas: contains all cells obtained through segmentation after filtering and QC. Includes coarse and fine level of annotations that were obtained through an iterative process of subclustering. Neighborhood analysis results are included as a metadata column. Additional Osteo-MSC and Fibro-MSC cells that were manually annotatedAML/NSM CODEX data: contains all cells after filtering for 3 diagnostic and 2 post-therapy AML samples as well as 3 negative staging marrow samples. Cell labels were derived through reciprocal principal component analysis (RPCA) reference mapping onto the normal bone marrow atlas. Neighborhood analysis was conducted separately for AML Diagnostic, AML Post-Therapy, and NSM samples. Neighborhoods were manually annotated for each set. The results of the neighborhood analysis were merged and included in the metadata of the Seurat object. All normalized data is stored in the Seurat assay object. Markers that were not included in normalization and downstream analysis are included with raw values as a metadata column. Full source code used to generate these objects can be found on GitHub: https://github.com/shovikb94/spatial-bonemarrow-atlas/tree/mainSee related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7174914
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell RNA-seq dataset from sorted IL-10+, TCRb+, CD4+ mouse small intestine lamina propria cells in naive or Giardia intestinalis-infected animals at 7 d.p.i. Data analyses and results are described in Alves-Sardinha et al., Nature Microbiology, 2025: "Giardia intestinalis-induced Type 2 mucosal immunity attenuates bystander intestinal inflammation". Data are Seurat objects in RDS format. Filtered-out potential doublets, low quality cells and dying cells (excluded cells with <800 genes detected, cells with >5000 genes detected and cells with mitochondrial gene expression > 10%). Data normalization, scaling and integration performed using Seurat v 4.4.0.
Full filtered dataset in the "alineGiardia.combined_v4.rds" file. Related R code is found in "giardia_mouse_integration.R".
T cells of interest only in the "T.seurat.rds" file. Related R code is in "TcellSubsets_sc_analysis.R".
Dataset was also mapped to a reference dataset by Kiner et al., Nature Immunology, 2021. The post-mapping data is found in the "refmap_kiner.seurat.rds" file. Related R code is in "referenceMapping_Kineretal2021.R".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ProjecTILs human reference atlas of CD4+ tumor-infiltrating T cells, generated by semi-supervised STACAS integration of 20 scRNA-seq datasets spanning 7 cancer types. The datasets were obtained from the collection by Zheng et al. Science 2021). The dataset is available as a Seurat object, with subtype labels (functional cluster), sample of origin and several other measurement for each cell stored in the object metadata. The object can be loaded into an R session using the readRDS() function, and directly used as a reference atlas for ProjecTILs (Andreatta et al. Nat Comms 2021).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data and code to reproduce figures in the manuscript "Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses"
# README
## Introduction
This README provides essential information about the codebase for the manuscript titled "Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses." The code in this repository is self-contained and is expected to run smoothly given the appropriate versions of the required libraries/packages.
## Directory structure and execution details
### R code
- Main Figures 2A-2D, 3A-3C, and 4A-4E, as well as supplemental figures S2A-S2H, S3A-S3E, S4L, and S5A-S5I, were generated using R. Execute the `R_figs_master.r` script located in the `r_code` directory.
- All figures will be saved within the `r_code/code_generated_figures` directory.
- Note: Exact UMAP representations might vary across different hardware and operating systems, likely due to an issue with the UWOT package ([Reference Issue](https://github.com/satijalab/seurat/issues/5514)). If figures appear outside their designated plot ranges, set "FixAxes" to 'FALSE' in the `single_cell_variables.r` script.
### MATLAB code
- Main figures 1B, 1D-1F, and 6A-6H, as well as supplemental figures S1A-S1J and S6A-S6I, were generated using MATLAB (version 9.11.0.1809720 (R2021b) Update 1). Execute the `get_the_figs_matlab.m` script located in the `matlab_code` directory.
- All figures will be saved within the `matlab_code/code_generated_figures` directory.
- Required: [fca_readfcs, version 2020.06.22](https://ch.mathworks.com/matlabcentral/fileexchange/9608-fca_readfcs).
### Python code
- Figures 5B-5F panels were generated using Python (version 3.6.8). Run the `fig_5_analysis_code.py` script located in the `python_code` directory.
- All figures will be saved within the `python_code/code_generated_figures` directory.
- The preprocessed images located in `python_code/data_repository/Adamts2_processed`, `python_code/data_repository/Agmat_processed`, and `python_code/data_repository/Baz1a_processed` were generated using the ImageJ macro `python_code/cropped_to_processed_macro.ijm` from the raw images in `python_code/data_repository/Adamts2_cropped`, `python_code/data_repository/Agmat_cropped`, and `python_code/data_repository/Baz1a_cropped`.
## Supplementary code (for reference only as raw data is not included)
### Mapping code and genome construction code
- Initial processing of Single-cell RNA-sequencing was performed with Cell Ranger, coordinated by the Python script:
`python_code/mapping_and_genome_construction/single_cell_mapping_pipeline.py`. Some components of this script are deprecated and were primarily used to pass .fastq files to Cell Ranger and organize the outputs.
- A custom genome was constructed to account for the expression of CaMPARI2 in the single-cell RNA-sequencing dataset:
`python_code/mapping_and_genome_construction/campari2_genome_construction.py`.
- Processing of Bulk RNA-sequencing, either single or paired-end, was executed through Python:
`python_code/mapping_and_genome_construction/bulk_single_end_mapping.py` and `python_code/mapping_and_genome_construction/bulk_paired_end_mapping.py`.
- A custom genome was constructed to account for the expression of various artificial promoter viruses:
`python_code/mapping_and_genome_construction/bulk_seq_genome_construction.py`.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Visium (10x Genomics) spatially resolved transcriptomics data generated from normal and Idiopathic Pulmonary Fibrosis (IPF) lung parenchyma tissues collected from human donors. The fresh-frozen tissues that were analyzed were from four healthy control (HC) subjects and from four IPF patients. For each IPF patient, three different tissues were selected representing areas of mild (“B1”), moderate (“B2") or severe (“B3”) fibrosis within the same donor, as determined by histological inspection of Hematoxylin and Eosin (H&E)-stained samples. Data from a total of 25 tissue sections, from 16 unique lung tissue blocks. The lung tissues were collected post-mortem (HC donors) or during lung transplant/resection (IPF patients) after obtaining informed consent. The study protocols were approved by the local human research ethics committee (HC: Lund, permit number Dnr 2016/317; IPF: Gothenburg, permit number 1026-15) and the samples are anonymized and cannot/should not be traced back to individual donors. Data included in this repository: - Visium data in the format of selected Space Ranger output files ("filtered_feature_bc_matrix.h5", "raw_feature_bc_matrix.h5", "web_summary.html", and the "spatial/" folder) for each individual section analysed. Zipped into one folder: "hs_visium_spaceranger_output.zip" - Sample metadata containing information for each sample with linked subject information: "hs_visium_metadata.tsv" - R object produced using STUtility and contains the processed data used for downstream analyses, most importantly all spot metadata with assigned data and deconvolution results (NMF, cell2location): "hs_visium_stutility_obj.rds" - Cell2location output files ("*_spot_cell_abundances_5pc.csv"), zipped into one folder: "cell2location_habermann2020.zip" - Full resolution H&E images ("*.jpg") of each tissue section that was used as input for spaceranger together with alignment json and sequencing fastq files. Zipped into one folder: "he_fullres_jpgs.zip" - Spot alignment files ("*.json") created in Loupe Browser using the corresponding full resolution H&E image in which spots under the tissue was identified. Zipped into one folder: "loupe_alignment_jsons.zip" Space Ranger output found within the zipped files in folders named "V*****-***-*1". To generate these files, raw FastQ files from the NovaSeq sequencing were processed with the Space Ranger pipeline (v. 1.2.2, 10x Genomics), where the reads were mapped to the GRCh38 reference genome. Manual spot alignment was performed in the Loupe Browser (v. 6, 10x Genomics) software. Cell type mapping results were obtained using the cell2location (v. 0.1) method, integrating the Space Ranger output data with annotated single cell RNA-seq data produced from human IPF lung, published by Habermann et al., 2020 (DOI: 10.1126/sciadv.aba1972, GEO accession: GSE135893). Seurat/STUtility object was generated from the Space Ranger output files, using the R packages STUtility (v. 1.1.1) and Seurat (v. 4.1.1) in R (v. 4.0.5) . All R scripts used for the data analyses can be found at https://github.com/lfranzen/spatial-lung-fibrosis. The deposited data is presented in the article "Mapping spatially resolved transcriptomes in human and mouse pulmonary fibrosis" by Franzén L & Olsson Lindvall M, et al. (preprint: "Translational mapping of spatially resolved transcriptomes in human and mouse pulmonary fibrosis", bioRxiv, https://doi.org/10.1101/2023.12.21.572330).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is included data derived from the processing of single-cell and single-nuclei RNA-seq from several samples (see below). This data corresponds to the input and intermediate output files from https://github.com/saezlab/Xu_tubuloid . Data The data include:
Binary sparse matrices for the UMI gene expression quantification from cellranger (filtered feature-barcode matrices). These are TAR archive files named with the name of the sample. Seurat Objects with normalized data, embeddings of dimensionality reduction, clustering and cell cluster annotation. These are TAR archive files including final objects, grouped by sample type: SeuratObjects_[SortedCells | Organoids | Human Kidney Tissue]. The HumanKidneyTissue also includes the SeuratObject after Harmony integration. Exported barcode idents from unsupervised clustering and manual annotation ("barcodeIdents*.csv" files). Label transfer via Symphony mapping to tubuloid cells from each organoid to a integrated reference atlas of human kidney tissue (SymphonyMapped*.csv).
Samples The data corresponds to the following samples, which were profiled at the single-cell resolution:
CK5 early organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at an early stage. CK119 late organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at a late stage.
JX1 late organoid (Healthy). Organoid generated following Hans Clever's protocol for kidney organoids. JX2 PKD1-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD1 was gene-edited to reproduce PKD phenotype, developed at a late stage. JX3 PKD2-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD2 was gene-edited to reproduce PKD phenotype, developed at a late stage. CK120 CD13. CD13+ sorted cells from human adult kidney tissue. CK121 CD24. CD24+ sorted cells from human adult kidney tissue.
In addition, human adult kidney tissue were profiled in the context of ADPKD:
CK224 : human specimen with ADPKD (PKD2- genotype).
CK225 : human specimen with ADPKD (PKD1- genotype). ADPKD3: human specimen with ADPKD (ND genotype).
Control1 : human specimen with healthy tissue. Control2 : human specimen with healthy tissue.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Visium (10x Genomics) spatially resolved transcriptomics data generated from the mouse model of bleomycin (BLM)-induced lung fibrosis, with lung tissue collected at 7 or 21 days post single-dose BLM (40 µg/mouse, o.p) or saline vehicle challenge. Data from a total of 24 tissue sections, from 18 unique mouse left lung lobes. The mice were all female C57BL/6NCrl, purchased from Charles River (Germany), and were 8 weeks + 5 days old at the time of study initiation. Animal handling conformed to standards established by the Council of Europe ETS123 AppA, the Helsinki Convention for the Use and Care of Animals, Swedish legislation, and AstraZeneca global internal standards. All mouse experiments were approved by the Gothenburg Ethics Committee for Experimental Animals in Sweden and conformed to Directive 2010/63/EU. The present study was approved by the local Ethical committee in Gothenburg (EA000680-2017) and the approved site number is 31-5373/11. Data included in this repository: - Visium data in the format of selected Space Ranger output files ("filtered_feature_bc_matrix.h5", "raw_feature_bc_matrix.h5", "metrics_summary.csv", and the "spatial/" folder) for each individual section analysed. Zipped into one folder: "mm_visium_spaceranger_output.zip" - Sample metadata containing information for each sample with linked subject information: "mm_visium_metadata.tsv" - R objects produced using STUtility and contains the processed data used for downstream analyses, most importantly all spot metadata with assigned data and deconvolution results (NMF, cell2location): "mm_visium_all_stutility_obj.rds" (all), "mm_visium_d7_stutility_obj.rds " (day 7 data), "mm_visium_d21_stutility_obj.rds" (day 21 data) - Cell2location output files ("*_spot_cell_abundances_5pc.csv"), zipped into one folder: "cell2location_strunz2020.zip" - Full resolution H&E images ("*.jpg") of each tissue section that was used as input for spaceranger together with alignment json and sequencing fastq files. Zipped into one folder: "he_fullres_jpgs.zip" - Spot alignment files ("*.json") created in Loupe Browser using the corresponding full resolution H&E image in which spots under the tissue was identified. Zipped into one folder: "loupe_alignment_jsons.zip" Space Ranger output found within the zipped files in folders named "V*****-***-*1". To generate these files, raw FastQ files from the NovaSeq sequencing were processed with the Space Ranger pipeline (v. 1.2.2, 10x Genomics), where the reads were mapped to the mm10 reference genome. Manual spot alignment was performed in the Loupe Browser (v. 6, 10x Genomics) software. Cell type mapping results were obtained using the cell2location (v. 0.1) method, integrating the Space Ranger output data with annotated single cell RNA-seq data produced from mouse BLM injured lungs collected at multiple time points, published by Strunz et al., 2020 (DOI: 10.1038/s41467-020-17358-3, GEO accession: GSE141259). Seurat/STUtility objects were generated from the Space Ranger output files, using the R packages STUtility (v. 1.1.1) and Seurat (v. 4.1.1) in R (v. 4.0.5). The d7 and d21 subset were derived from the object with all samples processed jointly, and thereafter NMF deconvolution and subsequent clustering (d21) was performed. All R scripts used for the data analyses can be found at https://github.com/lfranzen/spatial-lung-fibrosis. The deposited data is presented in the article "Mapping spatially resolved transcriptomes in human and mouse pulmonary fibrosis" by Franzén L & Olsson Lindvall M, et al. (preprint: "Translational mapping of spatially resolved transcriptomes in human and mouse pulmonary fibrosis", bioRxiv, https://doi.org/10.1101/2023.12.21.572330).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).