5 datasets found

Z
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
data.niaid.nih.gov
zenodo.org
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
Explore at:
Dataset updated
Nov 20, 2023
Dataset provided by
Hsu, Jonathan
Stoop, Allart
Description
Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
Data from: TempO-seq and RNA-seq gene expression levels are highly...
catalog.data.gov
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). TempO-seq and RNA-seq gene expression levels are highly correlated for most genes: A comparison using 39 human cell lines [Dataset]. https://catalog.data.gov/dataset/tempo-seq-and-rna-seq-gene-expression-levels-are-highly-correlated-for-most-genes-a-compar
Explore at:
Dataset updated
Jun 8, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).
c
Alternate gene annotations for rat, macaque, and marmoset for single cell...
kilthub.cmu.edu
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BaDoi Phan; Andreas Pfenning (2023). Alternate gene annotations for rat, macaque, and marmoset for single cell RNA and ATAC analyses [Dataset]. http://doi.org/10.1184/R1/21176401.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/21176401.v1
Dataset updated
May 30, 2023
Dataset provided by
Carnegie Mellon University
Authors
BaDoi Phan; Andreas Pfenning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Custom genome and gene annotations for single cell ATAC and RNA-seq analyses by BaDoi Phan (badoi dot phan at pitt dot edu)

This Kilthub upload is a clone of the github repository where this project may be updated or corrected in the future: https://github.com/pfenninglab/custom_ArchR_genomes_and_annotations

Premise: Not all of single-cell ATAC-seq biomedical molecular epigenetics is done in human and mouse genomes where there are high quality genomes and gene annotations. For the other species that are still highly relevant to study health and disease, here are some ArchR annotations to enable less frustration to have snATAC-seq data analyzed with ArchR.

Strategy for better gene annotations: We can use the proper that evolution of related mammalian species tend to have orthologous gene elements (TSS, exons, genes). For example, house mouse (mus musculus) is a median of 15.4MY diverged from the Norway rat (rattus norvegicus), with TimeTree. Humans are a median of 28.9 MY diverged from rhesus macaques. To borrow the higher quality and more complete gene annotations, we can use a gene-aware method of lifting gene annotations from one genome to another, liftoff, Shumate and Salzberg, 2021. For the source of "high quality" gene annotation, we use the NCBI Refseq annotations from the hg38/GRCh38 and mm10/GRCm38 annotations downloaded from the UCSC Genome browser.

For single cell RNA-seq, He, Kleyman et al. 2021 Current Biology (https://pubmed.ncbi.nlm.nih.gov/34727523/) found that using a regular liftOver of the human NCBI Refseq to rheMac10 was able to recover higher number of UMI counts to genes. This is likely due to incomplete annotations in either rheMac8 or rheMac10 genomes for the 3' UTRs that are usually targeted by common single cell/nucleus RNA-seq technologies. This allow more reads that would otherwise be found "outside" a gene because of incomplete 3' UTRs in a target species to be appropriately attributed to that gene using the orthologs of that gene from a more complete annotation in a related species. Furthermore, the complex splicing is better measured in humans, so more "intergenic" annotations by the rheMac10 annotations became "intronic" and better able to be mapped to a liftOvered annotation from human. For this reason, we create alternate annotations for the rhesus macaque, marmoset, and rat genomes borrowing orthology as identified with the newer liftoff method from more complete human or mouse annotations.

Similarly, for single cell ATAC-seq seq, a more complete map of genes and transcription start sites (TSS) enable aggregate metrics like a "gene score" to better calculate gene-based measures to perform co-clustering with single cell RNA-seq dataset. A more complete annotation would be able to accurately discern single cell open chromatin regions and not falsely report exonic regions or alternate promoters that were missed from primary transcriptomic data in monkey, marmoset, or rat but can be bioinformatically inferred.

Lastly, work by the ENCODE Consortium has found with the large human and mouse epigenomic data that certain regions of the genome in these species have artifactual signals and need to be excluded from epigenomic analsyes, Amemiya et al., 2021. These regions were pulled from and human and mouse from here and used the liftOver to map to the target genomes below, for simplicity.

list of resources by file name Surprisingly, all these files are small enough to put on github for a couple custom genomes. Below are the organizations - *.gtf.gz and *.gff3.gz: the gzipped annotation from the higher quality annotations to the target genome using liftoff - *liftOver*blacklist.v2.bed: the ENCODE regions to exclude from epigenomic analyses mapped to the target genome using liftOver - *ArchRGenome.R: the Rscript used to make the custom ArchR annotations - *ArchR_annotations.rda: the R Data object that contains the geneAnnotation and objects to use with ArchR::createArrowFiles()

list of species/genomes/source files For most of these files, the genome fasta sequences were grabbed from the UCSC Genome Browser at https://hgdownload.soe.ucsc.edu/goldenPath/${GENOME_VERSION}/, where ${GENOME_VERSION} is any of the version below except mCalJac1. Some of these genomes were updated from the Vertebrate Genome Project, which seeks to create complete rather than draft genome assemblies of all mammals on the planet, Rhie et al. 2021. These genomes have VGP and that naming version if there's an alternate naming scheme. The VGP is pretty cool and they make good genome assemblies.

rn6: rat genome v6, BCM-Baylor version

rn7: rat genome also called VGP mRatBN7.2

rheMac8: rhesus macaque v8

rheMac10: rhesus macaque v10

mCalJac1: marmoset VGP genome, fasta from the maternal assembly here
Cancer prediction
kaggle.com
zip
Updated Sep 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Parashar (2020). Cancer prediction [Dataset]. https://www.kaggle.com/abhiparashar/cancer-prediction
Explore at:
zip(4522896 bytes)Available download formats
Dataset updated
Sep 14, 2020
Authors
Abhishek Parashar
Description
Context

Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.

Content

RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors. The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.

● The R package cmapR can be used for reading GCTs in R. ● The python package cmapPy can be used for reading GCTs in python. ● Phantasus is an open source tool which is used to visualise GCT files, make various plots, apply algorithms like clustering and PCA among others.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5167145%2F0b806e97194db0142fc32c603e2cee96%2Fdownload.jpg?generation=1600082671314888&alt=media" alt="">

Acknowledgements

Source - Pancreatic cancer survival analysis defines a signature that predicts outcome - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084949/
COVID-19 vaccination single cell datasets
zenodo.org
application/gzip, bin
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija (2023). COVID-19 vaccination single cell datasets [Dataset]. http://doi.org/10.5281/zenodo.7555405
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7555405
Dataset updated
Sep 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PBMC samples for CITE-seq and ASAP-seq were collected at four time points: immediately before (Day 0) vaccination, after primary vaccination (Day 2, Day 10), and seven days after boost vaccination (Day 28).

The datasets uploaded here are three processed single-cell datasets:

1. PBMC_vaccine_CITE.rds: 3' RNA and surface proteins (173 TotalSeq-A antibodies)

2. PBMC_vaccine_ASAP.rds: Chromatin accessibility and surface proteins (173 TotalSeq-A antibodies)

3. PBMC_vaccine_ECCITE_TCR.rds: 5' RNA, surface proteins (137 TotalSeq-C antibodies), TCR and dextramer loaded with peptides of SARS-CoV-2 spike protein.

antigen_module_genes.rds: This file contains the vaccine-induced gene sets.

antigen_module_peaks.rds: This file contains the DE peaks specific for vaccine-induced cells.

To map the scRNA-seq query dataset onto our CITE-seq reference:

library(Seurat) PBMC_CITE <- readRDS("/zenedo/PBMC_vaccine_CITE.rds") query_scRNA <- readRDS("/home/xx/your_own_data.rds") anchors <- FindTransferAnchors( reference = PBMC_CITE, query = query_scRNA, normalization.method = "SCT", k.anchor = 5, reference.reduction = "spca", dims = 1:50) query_scRNA <- MapQuery( anchorset = anchors, query = query_scRNA, reference = PBMC_CITE, refdata = list( l1 = "celltypel1", l2 = "celltypel2", l3 = "celltypel3"), reference.reduction = "spca", reduction.model = "wnn.umap")

To use the scATAC-seq data, please run the commands below to update the path of the fragment file for the object.

Vaccine_ASAP <- readRDS("PBMC_vaccine_ASAP.rds") # remove fragment file information Fragments(Vaccine_ASAP) <- NULL # Update the path of the fragment file Fragments(Vaccine_ASAP) <- CreateFragmentObject(path = "download/PBMC_vaccine_ASAP_fragments.tsv.gz", cells = Cells(Vaccine_ASAP))
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

Explore at:

Dataset updated

Nov 20, 2023

Dataset provided by

Hsu, Jonathan
Stoop, Allart

Description

Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.

Clear search

Close search

Google apps

Main menu

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

1. Main Description

File Descriptions

Linked Files

Installation and Instructions

Data from: TempO-seq and RNA-seq gene expression levels are highly...

Alternate gene annotations for rat, macaque, and marmoset for single cell...

Cancer prediction

Context

Content

Acknowledgements

COVID-19 vaccination single cell datasets

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

1. Main Description

File Descriptions

Linked Files

Installation and Instructions