Table of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R
was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz
files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)
Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Custom genome and gene annotations for single cell ATAC and RNA-seq analyses by BaDoi Phan (badoi dot phan at pitt dot edu)
This Kilthub upload is a clone of the github repository where this project may be updated or corrected in the future: https://github.com/pfenninglab/custom_ArchR_genomes_and_annotations
Premise: Not all of single-cell ATAC-seq biomedical molecular epigenetics is done in human and mouse genomes where there are high quality genomes and gene annotations. For the other species that are still highly relevant to study health and disease, here are some ArchR annotations to enable less frustration to have snATAC-seq data analyzed with ArchR.
Strategy for better gene annotations: We can use the proper that evolution of related mammalian species tend to have orthologous gene elements (TSS, exons, genes). For example, house mouse (mus musculus) is a median of 15.4MY diverged from the Norway rat (rattus norvegicus), with TimeTree. Humans are a median of 28.9 MY diverged from rhesus macaques. To borrow the higher quality and more complete gene annotations, we can use a gene-aware method of lifting gene annotations from one genome to another, liftoff, Shumate and Salzberg, 2021. For the source of "high quality" gene annotation, we use the NCBI Refseq annotations from the hg38/GRCh38 and mm10/GRCm38 annotations downloaded from the UCSC Genome browser.
For single cell RNA-seq, He, Kleyman et al. 2021 Current Biology (https://pubmed.ncbi.nlm.nih.gov/34727523/) found that using a regular liftOver of the human NCBI Refseq to rheMac10 was able to recover higher number of UMI counts to genes. This is likely due to incomplete annotations in either rheMac8 or rheMac10 genomes for the 3' UTRs that are usually targeted by common single cell/nucleus RNA-seq technologies. This allow more reads that would otherwise be found "outside" a gene because of incomplete 3' UTRs in a target species to be appropriately attributed to that gene using the orthologs of that gene from a more complete annotation in a related species. Furthermore, the complex splicing is better measured in humans, so more "intergenic" annotations by the rheMac10 annotations became "intronic" and better able to be mapped to a liftOvered annotation from human. For this reason, we create alternate annotations for the rhesus macaque, marmoset, and rat genomes borrowing orthology as identified with the newer liftoff method from more complete human or mouse annotations.
Similarly, for single cell ATAC-seq seq, a more complete map of genes and transcription start sites (TSS) enable aggregate metrics like a "gene score" to better calculate gene-based measures to perform co-clustering with single cell RNA-seq dataset. A more complete annotation would be able to accurately discern single cell open chromatin regions and not falsely report exonic regions or alternate promoters that were missed from primary transcriptomic data in monkey, marmoset, or rat but can be bioinformatically inferred.
Lastly, work by the ENCODE Consortium has found with the large human and mouse epigenomic data that certain regions of the genome in these species have artifactual signals and need to be excluded from epigenomic analsyes, Amemiya et al., 2021. These regions were pulled from and human and mouse from here and used the liftOver to map to the target genomes below, for simplicity.
list of resources by file name Surprisingly, all these files are small enough to put on github for a couple custom genomes. Below are the organizations - *.gtf.gz and *.gff3.gz: the gzipped annotation from the higher quality annotations to the target genome using liftoff - *liftOver*blacklist.v2.bed: the ENCODE regions to exclude from epigenomic analyses mapped to the target genome using liftOver - *ArchRGenome.R: the Rscript used to make the custom ArchR annotations - *ArchR_annotations.rda: the R Data object that contains the geneAnnotation and objects to use with ArchR::createArrowFiles()
list of species/genomes/source files For most of these files, the genome fasta sequences were grabbed from the UCSC Genome Browser at https://hgdownload.soe.ucsc.edu/goldenPath/${GENOME_VERSION}/, where ${GENOME_VERSION} is any of the version below except mCalJac1. Some of these genomes were updated from the Vertebrate Genome Project, which seeks to create complete rather than draft genome assemblies of all mammals on the planet, Rhie et al. 2021. These genomes have VGP and that naming version if there's an alternate naming scheme. The VGP is pretty cool and they make good genome assemblies.
rheMac8: rhesus macaque v8
rheMac10: rhesus macaque v10
mCalJac1: marmoset VGP genome, fasta from the maternal assembly here
Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.
RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors. The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.
● The R package cmapR can be used for reading GCTs in R. ● The python package cmapPy can be used for reading GCTs in python. ● Phantasus is an open source tool which is used to visualise GCT files, make various plots, apply algorithms like clustering and PCA among others.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5167145%2F0b806e97194db0142fc32c603e2cee96%2Fdownload.jpg?generation=1600082671314888&alt=media" alt="">
Source - Pancreatic cancer survival analysis defines a signature that predicts outcome - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084949/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PBMC samples for CITE-seq and ASAP-seq were collected at four time points: immediately before (Day 0) vaccination, after primary vaccination (Day 2, Day 10), and seven days after boost vaccination (Day 28).
The datasets uploaded here are three processed single-cell datasets:
1. PBMC_vaccine_CITE.rds: 3' RNA and surface proteins (173 TotalSeq-A antibodies)
2. PBMC_vaccine_ASAP.rds: Chromatin accessibility and surface proteins (173 TotalSeq-A antibodies)
3. PBMC_vaccine_ECCITE_TCR.rds: 5' RNA, surface proteins (137 TotalSeq-C antibodies), TCR and dextramer loaded with peptides of SARS-CoV-2 spike protein.
antigen_module_genes.rds: This file contains the vaccine-induced gene sets.
antigen_module_peaks.rds: This file contains the DE peaks specific for vaccine-induced cells.
To map the scRNA-seq query dataset onto our CITE-seq reference:
library(Seurat)
PBMC_CITE <- readRDS("/zenedo/PBMC_vaccine_CITE.rds")
query_scRNA <- readRDS("/home/xx/your_own_data.rds")
anchors <- FindTransferAnchors(
reference = PBMC_CITE,
query = query_scRNA,
normalization.method = "SCT",
k.anchor = 5,
reference.reduction = "spca",
dims = 1:50)
query_scRNA <- MapQuery(
anchorset = anchors,
query = query_scRNA,
reference = PBMC_CITE,
refdata = list(
l1 = "celltypel1",
l2 = "celltypel2",
l3 = "celltypel3"),
reference.reduction = "spca",
reduction.model = "wnn.umap")
To use the scATAC-seq data, please run the commands below to update the path of the fragment file for the object.
Vaccine_ASAP <- readRDS("PBMC_vaccine_ASAP.rds")
# remove fragment file information
Fragments(Vaccine_ASAP) <- NULL
# Update the path of the fragment file
Fragments(Vaccine_ASAP) <- CreateFragmentObject(path = "download/PBMC_vaccine_ASAP_fragments.tsv.gz", cells = Cells(Vaccine_ASAP))
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Table of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R
was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz
files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)