5 datasets found
  1. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Hsu, Jonathan
    Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  2. Data from: TempO-seq and RNA-seq gene expression levels are highly...

    • catalog.data.gov
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). TempO-seq and RNA-seq gene expression levels are highly correlated for most genes: A comparison using 39 human cell lines [Dataset]. https://catalog.data.gov/dataset/tempo-seq-and-rna-seq-gene-expression-levels-are-highly-correlated-for-most-genes-a-compar
    Explore at:
    Dataset updated
    Jun 8, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Journal article published in PLOS One, Vol 20, Issue 5, e0320862, 2025; DOI: https://doi.org/10.1371/journal.pone.0320862; PMC12064016. The datasets generated and analyzed during the current study are provided in Supplemental S1 File. The RNA-seq data is Protein Atlas Version 23 from the Human Protein Atlas website (https://www.proteinatlas.org/about/download, “RNA HPA cell line gene data” released 2023.06.19). All FASTQ files and aligned counts for the U.S. EPA TempO-seq data have been deposited into NCBI Gene Expression Omnibus under the accession number GSE288929 and are publicly available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288929. The R code is available through FigShare at: https://doi.org/10.23645/epacomptox.27341970.v1. This dataset is associated with the following publication: Word, L., C. Willis, R. Judson, L. Everett, S. Davidson-Fritz, D. Haggard, B. Chambers, J. Rogers, J. Bundy, I. Shah, N. Sipes, and J. Harrill. TempO-seq and RNA-seq Gene Expression Levels are Highly Correlated for Most Genes: A Comparison Using 39 Human Cell Lines. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 20(5): e0320862, (2025).

  3. c

    Alternate gene annotations for rat, macaque, and marmoset for single cell...

    • kilthub.cmu.edu
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BaDoi Phan; Andreas Pfenning (2023). Alternate gene annotations for rat, macaque, and marmoset for single cell RNA and ATAC analyses [Dataset]. http://doi.org/10.1184/R1/21176401.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Carnegie Mellon University
    Authors
    BaDoi Phan; Andreas Pfenning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Custom genome and gene annotations for single cell ATAC and RNA-seq analyses by BaDoi Phan (badoi dot phan at pitt dot edu)

    This Kilthub upload is a clone of the github repository where this project may be updated or corrected in the future: https://github.com/pfenninglab/custom_ArchR_genomes_and_annotations

    Premise: Not all of single-cell ATAC-seq biomedical molecular epigenetics is done in human and mouse genomes where there are high quality genomes and gene annotations. For the other species that are still highly relevant to study health and disease, here are some ArchR annotations to enable less frustration to have snATAC-seq data analyzed with ArchR.

    Strategy for better gene annotations: We can use the proper that evolution of related mammalian species tend to have orthologous gene elements (TSS, exons, genes). For example, house mouse (mus musculus) is a median of 15.4MY diverged from the Norway rat (rattus norvegicus), with TimeTree. Humans are a median of 28.9 MY diverged from rhesus macaques. To borrow the higher quality and more complete gene annotations, we can use a gene-aware method of lifting gene annotations from one genome to another, liftoff, Shumate and Salzberg, 2021. For the source of "high quality" gene annotation, we use the NCBI Refseq annotations from the hg38/GRCh38 and mm10/GRCm38 annotations downloaded from the UCSC Genome browser.

    For single cell RNA-seq, He, Kleyman et al. 2021 Current Biology (https://pubmed.ncbi.nlm.nih.gov/34727523/) found that using a regular liftOver of the human NCBI Refseq to rheMac10 was able to recover higher number of UMI counts to genes. This is likely due to incomplete annotations in either rheMac8 or rheMac10 genomes for the 3' UTRs that are usually targeted by common single cell/nucleus RNA-seq technologies. This allow more reads that would otherwise be found "outside" a gene because of incomplete 3' UTRs in a target species to be appropriately attributed to that gene using the orthologs of that gene from a more complete annotation in a related species. Furthermore, the complex splicing is better measured in humans, so more "intergenic" annotations by the rheMac10 annotations became "intronic" and better able to be mapped to a liftOvered annotation from human. For this reason, we create alternate annotations for the rhesus macaque, marmoset, and rat genomes borrowing orthology as identified with the newer liftoff method from more complete human or mouse annotations.

    Similarly, for single cell ATAC-seq seq, a more complete map of genes and transcription start sites (TSS) enable aggregate metrics like a "gene score" to better calculate gene-based measures to perform co-clustering with single cell RNA-seq dataset. A more complete annotation would be able to accurately discern single cell open chromatin regions and not falsely report exonic regions or alternate promoters that were missed from primary transcriptomic data in monkey, marmoset, or rat but can be bioinformatically inferred.

    Lastly, work by the ENCODE Consortium has found with the large human and mouse epigenomic data that certain regions of the genome in these species have artifactual signals and need to be excluded from epigenomic analsyes, Amemiya et al., 2021. These regions were pulled from and human and mouse from here and used the liftOver to map to the target genomes below, for simplicity.

    list of resources by file name Surprisingly, all these files are small enough to put on github for a couple custom genomes. Below are the organizations - *.gtf.gz and *.gff3.gz: the gzipped annotation from the higher quality annotations to the target genome using liftoff - *liftOver*blacklist.v2.bed: the ENCODE regions to exclude from epigenomic analyses mapped to the target genome using liftOver - *ArchRGenome.R: the Rscript used to make the custom ArchR annotations - *ArchR_annotations.rda: the R Data object that contains the geneAnnotation and objects to use with ArchR::createArrowFiles()

    list of species/genomes/source files For most of these files, the genome fasta sequences were grabbed from the UCSC Genome Browser at https://hgdownload.soe.ucsc.edu/goldenPath/${GENOME_VERSION}/, where ${GENOME_VERSION} is any of the version below except mCalJac1. Some of these genomes were updated from the Vertebrate Genome Project, which seeks to create complete rather than draft genome assemblies of all mammals on the planet, Rhie et al. 2021. These genomes have VGP and that naming version if there's an alternate naming scheme. The VGP is pretty cool and they make good genome assemblies.

  4. Cancer prediction

    • kaggle.com
    zip
    Updated Sep 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Parashar (2020). Cancer prediction [Dataset]. https://www.kaggle.com/abhiparashar/cancer-prediction
    Explore at:
    zip(4522896 bytes)Available download formats
    Dataset updated
    Sep 14, 2020
    Authors
    Abhishek Parashar
    Description

    Context

    Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.

    Content

    RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors. The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.

    ● The R package cmapR can be used for reading GCTs in R. ● The python package cmapPy can be used for reading GCTs in python. ● Phantasus is an open source tool which is used to visualise GCT files, make various plots, apply algorithms like clustering and PCA among others.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5167145%2F0b806e97194db0142fc32c603e2cee96%2Fdownload.jpg?generation=1600082671314888&alt=media" alt="">

    Acknowledgements

    Source - Pancreatic cancer survival analysis defines a signature that predicts outcome - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084949/

  5. COVID-19 vaccination single cell datasets

    • zenodo.org
    application/gzip, bin
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija (2023). COVID-19 vaccination single cell datasets [Dataset]. http://doi.org/10.5281/zenodo.7555405
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bingjie Zhang; Bingjie Zhang; Rabi Upadhyay; Rabi Upadhyay; Yuhan Hao; Yuhan Hao; Marie I. Samanovic; Marie I. Samanovic; Ramin S. Herati; Ramin S. Herati; John Blair; John Blair; Jordan Axelrad; Jordan Axelrad; Mark J. Mulligan; Mark J. Mulligan; Dan R. Littman; Dan R. Littman; Rahul Satija; Rahul Satija
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PBMC samples for CITE-seq and ASAP-seq were collected at four time points: immediately before (Day 0) vaccination, after primary vaccination (Day 2, Day 10), and seven days after boost vaccination (Day 28).

    The datasets uploaded here are three processed single-cell datasets:

    1. PBMC_vaccine_CITE.rds: 3' RNA and surface proteins (173 TotalSeq-A antibodies)

    2. PBMC_vaccine_ASAP.rds: Chromatin accessibility and surface proteins (173 TotalSeq-A antibodies)

    3. PBMC_vaccine_ECCITE_TCR.rds: 5' RNA, surface proteins (137 TotalSeq-C antibodies), TCR and dextramer loaded with peptides of SARS-CoV-2 spike protein.

    antigen_module_genes.rds: This file contains the vaccine-induced gene sets.

    antigen_module_peaks.rds: This file contains the DE peaks specific for vaccine-induced cells.

    To map the scRNA-seq query dataset onto our CITE-seq reference:

    library(Seurat)
    
    PBMC_CITE <- readRDS("/zenedo/PBMC_vaccine_CITE.rds")
    query_scRNA <- readRDS("/home/xx/your_own_data.rds")
    
    anchors <- FindTransferAnchors(
      reference = PBMC_CITE,
      query = query_scRNA,
      normalization.method = "SCT",
      k.anchor = 5,
      reference.reduction = "spca",
      dims = 1:50)
     
    query_scRNA <- MapQuery(
      anchorset = anchors,
      query = query_scRNA,
      reference = PBMC_CITE,
      refdata = list(
       l1 = "celltypel1",
       l2 = "celltypel2",
       l3 = "celltypel3"),
      reference.reduction = "spca",
      reduction.model = "wnn.umap") 
    
    
    

    To use the scATAC-seq data, please run the commands below to update the path of the fragment file for the object.

    Vaccine_ASAP <- readRDS("PBMC_vaccine_ASAP.rds")
    # remove fragment file information
    Fragments(Vaccine_ASAP) <- NULL
    # Update the path of the fragment file 
    Fragments(Vaccine_ASAP) <- CreateFragmentObject(path = "download/PBMC_vaccine_ASAP_fragments.tsv.gz", cells = Cells(Vaccine_ASAP))

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

Explore at:
Dataset updated
Nov 20, 2023
Dataset provided by
Hsu, Jonathan
Stoop, Allart
Description

Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

  1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
  2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
  3. Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

  1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
  2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
  3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
  4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
Search
Clear search
Close search
Google apps
Main menu